Aalto Scientific Computing
This site contains documentation about scientific and data-intensive computing at Aalto University and beyond. It is targeted towards Aalto researchers, but has some useful information for everyone.
Welcome, researchers!
Welcome to Aalto, researchers. Aalto has excellent resources for you, but it can be quite hard to know of them all. These pages will provide a good overview of IT services for researchers for you (focused on computation and data-intensive work, including experimental work).
See also
These aren’t generic IT instructions - ITS has an introduction for staff somewhere (but apparently not online).
IT Services for Research is the comprehensive list of researcher-oriented IT services available (compared to this which is a starting tutorial)
What file storage to use? - good summary not focused on scientific computing.
Aalto services
Understanding all the Aalto services can be quite confusing. Here are some of the key players:
Department IT: Only a few departments (mainly in SCI) have their own IT staff. Others have people such as laboratory managers which may be able to provide some useful advice. Known links: CS, NBE, PHYS, Math.
Science-IT: Overlaps with SCI department IT groups. They run the Triton cluster and support scientific computing. Their services may be used throughout the entire university, but support is organized from the departments which fund them. The core Science-IT departments are CS, NBE, and PHYS. Science-IT runs a daily SciComp garage, where we provide hands on support for anything related to scientific computing. This site (scicomp.aalto.fi) is the main home, read more about us on the about page.
Aalto Research Software Engineers provide specialized services in computation, data, and software. If you ever think “I can’t do X because we don’t have the skills” or “I wish we could be more efficient”, realize you aren’t alone and open a request with us. Our projects last days to months, longer than typical support staff’s projects.
Aalto IT Services (ITS): Provides central IT infrastructure. They have a “IT Services for Research” group, but it is less specialized than Science-IT. ITS is the first place to contact for non-specialized services or people outside SCI.. Their infrastructure is used in all schools including SCI, and the base on which everyone builds. Their instructions are on aalto.fi, but most importantly the already-mentioned IT Services for Research page. Contact via servicedesk.
Aalto Research Services: Administrative-type support. Provides support for grantwriting, innovation and commercialization, sponsored projects, legal services for research, and research infrastructures. (In 2019 a separate “innovation services” split from the previous “research and innovation services”).
CSC is the Finnish academic computing center (and more). They provide a lot of basic infrastructure you use without knowing it, as well as computing and data services to researchers (all for free). research.csc.fi
The major sources of information are: everywhere:
aalto.fi is the normal homepage, but the joke is it’s hard to find anything and hard to use. This site is “not designed to have a logical structure and instead, you are expected to search for information” (actual quote). Some pages have more information appear if you log in, and there is no indication of which ones. In general, unless you know what you are looking for, don’t expect to find anything here without extensive work.
wiki.aalto.fi is obviously the Aalto wiki space. Anyone can make a space here, and many departments’ internal sites are here. Searching can randomly find useful information, but it is not a primary information source anymore. Most sites aren’t publically searchable.
scicomp.aalto.fi is where you are now. It has a lot of information related to scientific computing and data. We try to not duplicate what is on aalto.fi, but sometimes we elaborate or make things more findable. This might be the best place to find information on specialized research and scientific computing - as opposed to general “staff computing” you find other places.
Computers, devices, and end-user systems
Aalto provides computers to it’s employees, obviously. Wherther it is an Aalto wide managed system or standalone depends on your department policies. If it’s standalone, you are on your own. If managed, login is through your Aalto account. You can get laptop or desktop, and Linux, Mac, or Windows.
Desktops are connected directly to the wired networks and are typically preferred by researchers using serious data or computation. Linux desktops have fast and automatic access to all of the university data storage systems, including Triton and department storage. They also have a wide variety of scientific software already available (and somewhat similar to Triton). We have some limited instructions and pointers to the main instructions for mac and windows computers.
Managed laptops are usable in and out of the Aalto networks.
On both managed desktops and laptops you can become a “primary user”
which allows you to install needed software that is found from the
official repositories. Additionally, in some cases, Workstation
Administrator (wa
) account can be given which close to normal
root/Administrator account with some limitations. The “primary user”
is widely accepted and recommended by Aalto ITS to all users while
wa
accounts are regulated by the department policies or Aalto ITS.
Computing
With a valid Aalto account, you have two primary options: workstations and Triton. The Aalto workstations have basic scientific software installed.
Most demanding computing at Aalto is performed on Triton, the Aalto high performance computing cluster. It is a fairly standard medium-sized cluster, and it’s main advantage is the close integration into the Aalto environment: it shares Aalto accounts, its data storage (5PB) is also available on workstations, and has local support. If you need dedicated resources, you can purchase them and they can be managed by Science IT team as part of Triton so that you get dedicated resources and can easily scale to the full power of Triton. Triton is part of the Finnish Grid and Cloud Infrastructure. Triton is the largest publically known computing cluster in Finland after the CSC clusters. Triton provides a web-based interface via JupyterHub and Open OnDemand. To get started with Triton, request access, check the tutorials sequence (or quickstart guide if you know the basics), and you’ll learn all you need.
CSC (the Finnish IT Center for Science) is a government-owned organization which provides a lot of services, most notably huge HPC clusters, data, and IT infrastructure services to the academic sector. All of their services are free to the academic community (paid directly by the state of Finland). They also coordinate the Finnish Grid and Cloud Infrastructure. They have the largest known clusters in Finland.
Data
Data management isn’t just storage: if data is just put somewhere, you get a massive mess and data isn’t usable in even 5 years. Funders now require “data management plans”. Thus data management is not just a hot topic, it’s an important one. We have a whole section on data (not maintained much anymore), and also there are higher level guides from Aalto. If you just want to get something done, you should start with our Aalto-specific guideline for Science-IT data storage (used in CS, NBE, PHYS) - if you follow our plan, you will be doing better than most people. If you have specific questions, there is an official service email address you can use (see the Aalto pages), or you can ask the Science-IT team.
Aalto has many data storage options, most free. In general, you should put your data in some centralized location shared with your group: if you keep it only on your own systems, the data dies when you leave. We manage data by projects: a group of people with shared access and a leader. Groups provide flexibility, sharing, and long-term management (so that you don’t lose or forget about data every time someone leaves). You should request as many projects as you need depending on how fine-grained you need access control, and each can have its own members and quota. You can read a general guide from Aalto (going beyond scientific computing) about the storage locations available and storage service policy.
Triton has 5PB of non-backed up data storage on the high-performance Lustre filesystem. This is used for large active computation purposes. The Triton nodes have an incredible bandwidth to this and it is very fast and parallel. This is mounted by default at Science-IT departments, and can be by default in other departments too.
Aalto provides “work” and “teamwork” centralized filesystems which are large, backed up, snapshotted, shared: everything you may want. Within the Science-IT departments, Science-IT and department IT manages it and provides access. For other schools/departments, both are provided by Aalto ITS but you will have to figure out your school’s policies yourself. It’s possible to hook this storage into whatever else you need over the network. (In general, “work” is organized by the Aalto hierarchy, while “teamwork” is flatter. If you consider yourself mainly Aalto staff who fits in the hierarchy, work is probably better. If you consider yourself a research who collaborates with whoever, teamwork is better.) Teamwork instructions
CSC provides both high-performance Lustre filesystems (like Triton) and archive systems. CSC research portal.
In our data management section, we provide many more links to long-term data repositories, archival, and so on. The fairdata.fi project is state-supported and has a lot more information on data. They also provide some data storage focused on safety and longer-term storage (like IDA), though they are not very used at Aalto because we provide such good services locally.
Aalto provides, with Aalto accounts, Google Drive (unlimited, also Team Drives), Dropbox (unlimited), and Microsoft OneDrive (5TB). Be aware that once you leave Aalto, this data will disappear!
Software
Triton and Aalto Linux workstations come with a lot of scientific software installed, with in the Lmod system. Triton generally has more. If you need something, it can be worth asking us first to install it for everyone.
If you are the primary user of a workstation, you can install Ubuntu packages yourself (and if you aren’t, you should ask to be marked as primary user). If you use Triton or are in a Science-IT department, it can be worth asking Science-IT about software you need - we are experts in this and working to simplify the mess that scientific software is. Windows workstations can have things automatically installed, check the windows page.
Triton and Aalto workstations have the central software available, currently for laptops you are on your own except for some standard stuff.
On Triton and Linux workstations, type module spider $name
to
search for available software. We are working to unify the software
stack available on Triton and Aalto workstations so that they have all
the same stuff.
ITS has a software and licenses (FI) page, and also a full list of licenses (broken link, missing on new page). There is also https://download.aalto.fi/.
CSC also has a lot of software. Some is on CSC computers, some is exported to Triton.
Starting a project
Each time you start a project, it’s worth putting a few minutes into planning so that you create a good base (and don’t end up with chaos in a few years). We don’t mean some grant, we mean a line of work with a common theme, data, etc.
Think about how you’ll manage data. It’s always easy to just start working, but it can be worth getting all project members on the same page about where data will be stored and what you want to happen to it in the end. Having a very short thing written will also help a lot to get newcomers started. The “practical DMP” section here can help a lot - try filling out that A4 page to consider the big sections.
Request a data group (see above) if you don’t already have a shared storage location. This will keep all of your data together, in the same place. As people join, you can easily give them access. When people leave, their work isn’t lost.
If you already have a data group that is suitable (similar members), you can use that. But there’s no limit to the number of projects, so think about if it’s better to keep things apart earlier.
Mail your department IT support and request a group. Give the info requested at the bottom of data outline page.
In the same message, request the different data storage locations, e.g. scratch, project, archive. Quotas can always be increased later.
If you need specialized support in computing, data, or software, request a consultation with Aalto Research Software Engineers.
Training
Of course you want to get straight to research. However, we come from a wide range of backgrounds and we’ve noticed that missing basic skills (computer as a tool) can be a research bottleneck. We have constructed a multi-level training plan, Hands-on Scientific Computing so that you can find the right courses for your needs. We have extensive internal training about practical matters not covered in academic courses. These courses are selected by researchers for researchers, so we make sure that everything is relevant to you.
Check our upcoming training page for a list of upcoming courses. If you do anything computational or code-based at all, you should consider the twice-yearly CodeRefinery workshops (announced on our page). If you have a Triton account or do high-performance computing or intensive computing or data-related tasks, you should come to the Summer (3 days) or Winter (1 day) kickstart, which teaches you the basics of Triton and HPC usage (we say it is “required” if you have a Triton account).
Other notes
Remember to keep the IT Services for Research and What file storage to use? pages close at hand
Research is usually collaborative, but sometimes you can feel isolated - either because you are lost in a crowd, or far away from your colleagues. Academic courses don’t teach you everything you need to be good at scientific computing - put some effort into working together with, learning from, and teaching your colleagues and you will get much further.
There are some good cheatsheets which our team maintains. They are somewhat specialized, but useful in the right places.
It can be hard to find your way around Aalto, the official campus maps and directions are known for being confusing confusing. Try UsefulAaltoMap instead.
Welcome, students!
See also
Primary information is at Aalto’s IT Services for Students page, which focuses on basic services. This focuses on students in computing and data intensive programs.
Welcome to the Aalto! We are glad you are interested in scientific computing and data. scicomp.aalto.fi may be useful to you, but is somewhat targeted to research usage. However, it can still serve as a good introduction to resources for scientific and data-intensive computing at Aalto if you are a student. This page is devoted to resources which are available to students.
If you are involved in a research group or doing researcher for a professor/group leader, you are a researcher! You should acquaint yourself with all information on this site, starting with Welcome, researchers! and use whatever you need.
General IT instructions can be found at https://www.aalto.fi/en/it-help. There used to be some on into.aalto.fi, but these are gone now. There also used to be a 2-page PDF introduction for students, but it also seems to be gone from online. IT Services for Students is now the best introduction.
Accounts
In general, your Aalto account is identical to that which researchers have — the only difference is that you don’t have an departmental affiliation.
Getting help
As a student, the ITS servicedesks are the first place to go for help. The site https://www.aalto.fi/en/it-help is the new central site for IT instructions.
This site, https://scicomp.aalto.fi, is intended for research scientific computing support but has a few page useful to you.
Computation
As a student, you have access to various light computational resources which are suitable for most courses that need extra power:
Linux workstations, GPUs, software via modules |
|
Other computer labs |
workstations, different OSs |
via ssh, software via modules, overcrowded. Brute and Force are for computation, others not. |
|
basic software, in web browser |
|
Windows and Linux |
|
Own computers |
Software at https://download.aalto.fi |
The Jupyter service at https://jupyter.cs.aalto.fi is available to everyone with an Aalto account. It provides at least basic Python and R software; we try to keep it up to date with the things people need most for courses that use programming or data.
The shell servers
brute
and force
are for light computing, and generally for
students. You may find them useful, but can often be
overloaded. Light computing shell servers. Learn
how to launch Jupyter notebook on there.
For GPU computing, the Paniikki Linux computer lab (map) has GPUs in all
workstations. Software is available via module spider $name
to
search and module load $name
to load (and the module anaconda
has Python, tensorflow, etc.). Read the Paniikki cheatsheet
here. The instructions for Aalto
workstations sort of apply there as well. The
software on these machines is managed by the Aalto-IT team. This is
the place if you need to play with GPUs, deep learning, etc, and helps
you transition to serious computing on large clusters.
A new (2018) remote desktop service is available at https://vdi.aalto.fi (instructions). This provides Windows and Linux desktops and is designed to replace the need for computer classrooms with special software installed. You can access it via a web browser or the VMware Horizon client. More VDI Windows workstations are also available at http://mfavdi.aalto.fi/ .
The use of Triton is for research purposes and students can’t get access unless you are affiliated with a research project or (in very rare cases), a course makes special arrangements.
Data storage
Aalto home directories have a 100GB quota, and this is suitable for small use. Note that files here are lost once you leave Aalto, so make sure you back up.
The What file storage to use? page contains basic services which may be useful for data storage. Of the cloud services, note that everyone at Aalto can get an unlimted Google Drive account through the Aalto Google Apps service: instructions. Your Aalto Google account will expire once you are no longer affiliated, so your files here will become inaccessible.
Software
ITS has a software and licenses (FI) page, and also a full list of licenses. There is also http://download.aalto.fi/. Various scientific software can be found for your own use via the Aalto software portals.
The Lmod (module
) system provides more software on
brute
/force
and in Paniikki. For example, to access a bunch
of scientific Python software, you can do module load anaconda
.
The researcher-focused instructions are here, but like many things on this site you may have
to adapt to the student systems.
Common software:
Python |
|
Tensorflow etc packages |
same as Python, in Paniikki |
Other notes
It can be hard to find your way around Aalto, the official campus maps and directions are known for being confusing confusing. Try UsefulAaltoMap instead.
Do you have suggestions for this page? Please leave on issue on Github (make sure you have a good title that mentions the audience is students, so we can put the information in the right place). Better yet, send a pull request to us yourself.
News
22/03/2024 We have a new exciting course coming! Best practices in HPC! More info below
Upcoming Courses
April-May/2024 Tuesdays Tools and Techniques for High Performance Computing (TTT4HPC). More info and registrations at this page. A new course on HPC practices! Four self-contained episodes. Pick the one which you need the most or join us for all of them!
Join our daily zoom garage for any scientific computing related issue (not just Triton!) or to just chat and feel part of the community. Follow us on Mastodon.
News
22/03/2024 We have a new exciting course coming! Best practices in HPC! More info below
Upcoming Courses
April-May/2024 Tuesdays Tools and Techniques for High Performance Computing (TTT4HPC). More info and registrations at this page. A new course on HPC practices! Four self-contained episodes. Pick the one which you need the most or join us for all of them!
Join our daily zoom garage for any scientific computing related issue (not just Triton!) or to just chat and feel part of the community. Follow us on Mastodon.
News archive
24/01/2024 Triton users’ group meeting. This is a special event for all the users of our cluster: come and hear what’s new with Triton. This is also a good moment to express your wishes for future developments.
16-18/01/2024 Linux Shell Scripting course. More info and registrations at this page. Please also visit our training webpages to check other upcoming courses or request a re-run of past hands-on courses.
22/11/2022 From 22nd till 25th of November we will be running our popular *Python for Scientific Computing* course again.
21/10/2022 Upgrade of the triton login node: After running out of memory several times on our login node, we upgraded the memory from previously 128Gb to 256Gb. This is hopefully sufficient for most compilation and development work happening on the node. Any computation or memory intensive job should still be run on the compute nodes, but this upgrade provides us with a more robust system.
12/09/2022 September 2022, it is another academic year! We have CodeRefinery starting in one week: if you write code for research, then this is the workshop for you! Come and learn about git, jupyter, conda, reproducibility and much more. Click here for CodeRefinery Fall 2022 registration and info page.
9/06/2022 Join us today on Twitch.tv at 12:00 EEST for our Intro to Scientific Computing and HPC. The course is open to anyone with an internet connection. If you want to do the hands-on exercises with us, you need access to an HPC cluster. If you are at Aalto please apply for access to the triton cluster, otherwise check what is available at your institution. You can also watch without doing the practical parts, but we recommend registering anyway so you will be able to ask questions on HackMD.
17/01/2022 Join us for our next Twitch.tv courses dedicated to the basics of scientific computing and HPC: 2/Feb/2022 Intro to Scientific Computing and 3-4/Feb/2022 Intro to High Performance Computing. The course is open to anyone with an internet connection. For day 2+3 you need access to an HPC cluster. If you are at Aalto please apply for access to the triton cluster, otherwise check what is available at your institution. You can also watch without doing the practical parts, but we recommend registering anyway so you will be able to ask questions on HackMD.
8/09/2021 Research Software Hour Twitch show is back at a different time. Join us today at 15:00 to talk about “Computers for research 101: The essential course that everyone skipped”.
9/8/2021 We are back from the summer break. Our zoom garage schedule is back to normal (every day at 13:00).
7-9/06/2021 New Triton user? Join our course on how to use Triton and HPC https://scicomp.aalto.fi/training/scip/summer-kickstart/
10/05/2021 CodeRefinirey online workshop starts today. Tune in for git intro part 1. If you did not register, you can watch via Twitch: https://www.twitch.tv/coderefinery
01/04/2021 April fools’ … NOT: no jokes but instead a reminder that we have new courses starting in April “Hands on Data Anonymization” and “Software Design for Scientific Computing”. More info and registration links at https://scicomp.aalto.fi/training/
19/03/2021 Linux Shell Scripting starts next week! There is still time to register at: https://scicomp.aalto.fi/training/scip/shell-scripting/
15/02/2021 We have a new login node and new software versions on Triton for: abinit, anaconda, cuda, julia, and quantum espresso. Read more at our issue tracker. We recommend following the issue tracker for live updates from us and from our users too!
14/01/2021 Save the date: 29 January 2021: Crash course on Data Science workflows at Aalto + Linux terminal basics in preparation for 1-2 February 2021: Triton Winter Kickstart. Registration link can be found withing the course pages. Kickstart course is highly recommended to new Triton HPC users.
10/12/2020 We are updating and consolidating our tutorials and guidelines on https://scicomp.aalto.fi website. There might be temporary broken links, please let us know if you spot anything that does not look as it should. Please note that the next Research Software Hour on https://twitch.tv/RSHour will be on Thursday 17/12 at 21:30 Helsinki time. A special episode about Advent of Code 2020.
02/12/2020 This week Research Software Hour on https://twitch.tv/RSHour will happen during the day, straight from the https://nordic-rse.org/ meeting! 13:30 Helsinki time: All you wanted to know about the Rust programming language! Past episodes at Research Software Hour .
26/11/2020 Today at 21:30 Helsinki time, join us for another live episode of Research Software Hour on https://twitch.tv/RSHour Tonight: code debugging! Past episodes at Research Software Hour .
19/11/2020 Our course on Matlab Basics finishes today. Videos from the course will be uploaded to the Aalto Scientific Computing YouTube channel. See the course webpage for more info.
10/11/2020 Our course on Matlab Basics starts today. See the course webpage for more info.
29/10/2020 Today at 21:30 Helsinki time, join us for another live episode of Research Software Hour on https://twitch.tv/RSHour Tonight: git-annex to version control your data and HPC cluster etiquette.
26/10/2020 Tomorrow day 4 of our online CodeRefinery workshop. Materials are available here https://coderefinery.github.io/2020-10-20-online and if you did not register, you can watch it live at https://www.twitch.tv/coderefinery.
21/10/2020 Today day 2 of our online CodeRefinery workshop. Materials are available here https://coderefinery.github.io/2020-10-20-online and if you did not register, you can watch it live at https://www.twitch.tv/coderefinery.
20/10/2020 Today day 1 of our online CodeRefinery workshop. Come and learn about version control, jupyter, documentation. Materials are available here https://coderefinery.github.io/2020-10-20-online and if you did not register, you can watch it live at https://www.twitch.tv/coderefinery.
19/10/2020 Today **”Triton users group meeting”**, come and hear about the future of Triton/ScienceIT/Aalto Scientific Computing, exciting news on new services, new hardware (GPUs!), and anything related to Aalto Scientific Computing.
16/10/2020 Today the fourth an last part of our course on Data analysis workflows with R and Python . You can watch it on CodeRefinery Twitch channel.
14/10/2020 Today our course on Data analysis workflows with R and Python continues. You can watch it on CodeRefinery Twitch channel. Please note that the last part of the course is on Friday 16/10/2020.
13/10/2020 Tomorrow our course on Data analysis workflows with R and Python continues. You can watch it on CodeRefinery Twitch channel.
06/10/2020 Today is Tuesday, however Research Software Hour has now moved from Tuesdays to Thursdays. Tune in on Twitch on Thursday October 15 at 21:30 (Helsinki time) to watch live the next episode.
05/10/2020 Today starts our Data analysis workflows with R and Python. You can watch it on CodeRefinery Twitch channel.
29/09/2020 - Join us tonight (21:30 Helsinki time), for Research Software Hour, a one hour interactive discussion with Radovan Bast and Richard Darst. Tonight how to organise research software projects and other tips to keep track of notes, tools, etc.
28/09/2020 – Friendly reminder that you can still register for our Data analysis workflows with R and Python. Link to registration is here. Also save the date: Mon 19/10/2020 at 14:00 “Triton users group meeting”, come and hear about the future of Triton/ScienceIT/Aalto Scientific Computing, exciting news on new services, new hardware (GPUs!), and anything related to Aalto Scientific Computing. More details coming soon.
25/09/2020 – Friendly reminder that you can still register for our Data analysis workflows with R and Python. Link to registration is here.
24/09/2020 – Join our informal chat about research software on zoom at 10:00: RSE activities in Finland. Today is also the SciComp garage day focused on HPC/Triton issues: daily garage.
23/09/2020 – Last day of our course on “Python for Scientific Computing” covering packaging and binder. It can also be watched live on CodeRefinery Twitch if you did not have time to register.
22/09/2020 – Join us tonight (21:30 Helsinki time), for Research Software Hour, a one hour interactive discussion with Radovan Bast and Richard Darst. Tonight we cover command line arguments and running things in parallel. You can watch RSH past episodes on YouTube to get an idea of the topics covered.
21/09/2020 – This week is the last week of our course on “Python for Scientific Computing” You can re-watch the lessons on CodeRefinery Twitch channel
14/09/2020 – Our course on “Python for Scientific Computing” has started today. It can also be watched live on CodeRefinery Twitch if you did not have time to register.
08/09/2020 – “Research Software Hour” will start on 22/09/2020. RSH is an interactive, streaming web show all about scientific computing and research software. You can watch past episodes at the RSH video archive on youtube.
xx/09/2020 – We started a small News section to keep users up to date and avoid missing important things coming up. Check our trainings coming in October and November. Join our daily garage if you have issues to discuss related to computing or data management.
The Aalto environment
Aalto provides a wide variety of support for scientific computing. For a summary, see the IT Services for Research page. For information about data storage at Aalto, see the section on data management below.
Aalto tools
For more services provided at the Aalto level, see the IT Services for Research page.
Aalto account
Extension to Aalto account and email
Aalto account expiration is bound to staff or student status. Account closes one week after the affiliation to Aalto university ends. Expiration is managed completely by Aalto IT Services, and department IT staff is not able to extend Aalto accounts.
If extension to account is needed, this may be achieved with visitor contract. The contract requires host information, so you should contact your supervisor who (if accepting your request) contacts HR with needed details to prepare the official visitor contract.
Aalto Linux
See also
https://linux.aalto.fi/ provides official information on Aalto Linux for all Aalto. This page is a bit focused on the Science-IT departments, but also useful for everyone.
Aalto Linux is provided to all departments in Aalto. Department IT co-maintains this, and in some departments provides more support (specifically, CS, NBE, PHYS and Math at least). It contains a lot of software and features to support scientific computing and data. Both laptop and desktop setups are available.
This page is mainly about the Linux flavor in CS/PHYS/NBE and partly Math, co-managed by these departments and Science-IT. Most of it is relevant to all Aalto, though.
Basics
Aalto home directory. In the Aalto Ubuntu workstations, your home directory will be your Aalto home directory. That is, the same home directory that you have in Aalto Windows machines and the Aalto Linux machines, including shell servers (kosh, taltta, lyta, brute, force).
Most installations have Ubuntu 16.04 or 18.04, 20.04 is coming soon.
A pretty good guide is availiable at https://linux.aalto.fi .
Login is with Aalto credentials. Anyone can log in to any computer. Since login is tied to your Aalto account, login is tied to your contract status. Please contact HR if you need to access systems after you leave the university or your account stops working due to contract expiration.
All systems are effectively identical, except for local Ubuntu packages installed. Thus, switching machines is a low-cost operation.
Systems are centrally managed using puppet. Any sort of configuration group can be set up, for example to apply custom configuration to one group’s computers.
Large scientific computing resources are provided by the Science-IT project. The compute cluster there is named Triton. Science-IT is a school of science collaboration, and its administrators are embedded in NBE, PHYS, CS IT.
Workstations are on a dedicated network VLAN. The network port must be configured before it can be turned on and you can’t just assume that you can move your computer to anywhere else. You can request other network ports enabled for personal computers, just ask.
Installation is fully automated via netboot. Once configuration is set up, you can reboot and PXE boot to get a fresh install. There is almost no local data (except the filesystem for tmp data on the hard disks which is not used for anything by default,
/l/
below), so reinstalling is a low-cost operation. The same should be true for upgrading, once the new OS is ready you reboot and netinstall. Installation takes less than two hours.Default user interface. The new default user interface for Aalto Linux is Unity. If you want to switch to the previous default interface (Gnome), before logging in please select “Gnome Flashback (Metacity)” by clicking the round ubuntu logo close to the “Login” input field.
Personal web pages. What you put under
~/public_html
will be visible athttps://users.aalto.fi/~username
. See Filesystem details.
When requesting a new computer:
Contact your department IT
Let us know who the primary user will be, so that we can set this properly.
When you are done with a computer:
Ensure that data is cleaned up. Usually, disks will be wiped, but if this is important then you must explicitly confirm before you leave. There may be data if you use the workstation local disks (not the default). There is also a local cache (
$XDG_CACHE_HOME
), which stores things such as web browser cache. Unix permissions protect all data, even if the primary user changes, but it is better safe than sorry. Contact IT if you want wipes.
Laptops
You can get laptops with Linux on it.
Each user should log in the first time while connected to the Aalto network. This will cache the authentication information, then you can use it wherever you want.
Home directories can be synced with the Aalto home directories. This is done using unison. TODO: not documented, what about this?
If you travel, make sure that your primary user is set correctly before you go. The system configuration can’t be updated remotely.
Otherwise, environment is like the workstations. You don’t have access to the
module
system, though.If the keychain password no longer works: see FAQ at the bottom.
Workstations
Most material on this page defaults to the workstation instructions.
Primary User
The workstations have a concept of the “primary user”. This user can install software from the existing software repositories and ssh remotely to the desktops.
- Primary users are implemented as a group with name
$hostname-primaryuser
. You can check primary user of a computer by usinggetent group $hostname-primaryuser
or check your primary-userness withgroups
.
If you have a laptop setup, make sure you have the PrimaryUser set! This can’t be set remotely.
Make sure to let us know about primary users when you get a new computer set up or change computers. You don’t have to, but it makes it convenient for you.
It is not currently possible to have group-based primary users (a group of users all have primary user capabilities across a whole set of computers, which would be useful in flexible office spaces). TODO: are we working on this? (however, one user can have primary user access across multiple computers, and hosts can have multiple primary users, but this does not scale well)
Data
See the general storage page for the full story (this is mainly oriented towards Linux). All of the common shared directories are available on department Linux by default.
We recommend that most data is stored in shared group directories, to provide access control and sharing. See the Aalto data page.
You can use the program unison
or unison-gtk
to synchronise
files.
Full disk encryption (Laptops)
All new (Ubuntu 16.04 and 18.04) laptops come with full disk encryption by default (instructions). This is a big deal and quite secure, if you use a good password.
When the computer is first turned on, you will be asked for a disk
encryption password. Enter something secure and remember it - you have
only one chance. Should you want to change this password, take the
computer to an Aalto ITS service desk. They can also add more passwords
for alternative users for shared computers. Aalto ITS also has a backup
master key. (If you have local root access, you can do this with
cryptsetup
, but if you mess up there’s nothing we can do).
Desktop workstations do not have full disk encryption, because data is not stored directly on them.
Software
Already available
Python:
module load anaconda
(or anaconda2 for Python 2) (desktops)Matlab: automatically installed on desktops, Ubuntu package on laptops.
Ubuntu packages
If you have PrimaryUser privileges, you can install Ubuntu packages using one of the following commands:
By going to the Ubuntu Software Center (Applications -> System Tools -> Administration -> Ubuntu Software Centre). Note: some software doesn’t appear here! Use the next option.
aptdcon --install $ubuntu_package_name
(search for stuff usingapt search
)By requesting IT to make a package available across all computers as part of the standard environment. Help us to create a good standard operating environment!
The module system
The command module
provides a way to manage various installed
versions of software across many computers. This is the way that we
install custom software and newer versions of software, if it is not
available in Ubuntu. Note that these are shell functions that alter
environment variables, so this needs to be repeated in each new shell
(or automated in login).
Note: The modules are only available on Aalto desktop machines, not on laptops.
See the Triton module docs docs for details.
module load triton-modules
will make most Triton software available on Aalto workstations (otherwise, most is hidden).module avail
to list all available package.module spider $name
to search for a particular name.module load $name
to load a module. This adjusts environment variables to bring various directories intoPATH
,LD_LIBRARY_PATH
, etc.We will try to keep important modules synced across the workstations and Triton, but let us know.
Useful modules:
anaconda
andanaconda2
will always be kept up to date with the latest Python Anaconda distribution, and we’ll try to keep this in sync across Aalto Linux and Triton.triton-modules
: a metamodule that makes other Triton software available.
Admin rights
Most times you don’t need to be an admin on workstations. Our Linux systems are centrally managed with non-standard improvements and features, and 90% of cases can be handled using existing tools:
Do you want to:
Install Ubuntu packages: Use
aptdcon --install $package_name
as primary user.This website tells me to run
sudo apt-get
to install something. Don’t, use the instructions above.This website gives me some random instructions involving
sudo
to install their program. These are not always a good idea to run, especially since our computers are networked, centrally managed, and these instructions don’t always work. Sometimes, these things can be installed as a normal user with simple modifications. Sometimes their instructions will break our systems. In this case, try to install as normal user and then send a support request first. If none of these work and you have studied enough to understand the risk, you can ask us. Make sure you give details of what you want to do.I need to change network or some other settings. Desktops are bound to a certain network and settings can’t be changed, users can’t be managed, etc.
It’s a laptop: then yes, there are slightly more cases you need this, but see above first.
I do low-level driver, network protocol, or related systems development. Then this is a good reason for root, ask us.
If you do have root and something goes wrong, our help is limited to reinstalling (wiping all data - note that most data is stored on network drives anyway).
If you do need root admin rights, you will have to fill out a form and get a
new wa
account, then Aalto has to approve. Contact your
department IT to get the process started.
Remote access to your workstation
If you are primary user, you can ssh to your own workstation from
certain Aalto servers, including at least taltta
. See the
remote access page.
More powerful computers
There are different options for powerful computing.
First, we have desktop Linux workstations that are more powerful than normal. If you want one of these, just ask. It includes a medium-power GPU card. You can buy a more powerful workstation if you need, but…
Beyond that, we recommend the use of Triton rather than constructing own servers which will only be used part-time. You can either use Triton as-is for free, or pay for dedicated hardware for your group. Your own hardware as part of Triton means that you can use all Triton and even CSC if you need with little extra work. You could have your own login node, or resources as part of the queues.
Triton is Aalto’s high-performance computing cluster. It is not a part of the department Linux, but is heavily used by researchers. You should see the main documentation at the Triton user guide, but for convenience some is reproduced here:
Triton is CentOS (compatible with the Finnish Grid and Cloud Infrastructure), while CS workstations are Ubuntu. So, they are not identical environments, but we are trying to minimize the differences.
Since it is is part of FGCI, it is easy to scale to more power if needed.
We will try to have similar software installed in workstation and Triton module systems.
The paths
/m/$dept/
are designed to be standard across computersThe project and archive filesystems are not available on all Triton nodes. This is because they are NFS shares, and if someone starts a massively parallel job accessing data from here, it will kill performance for everyone. Since history shows this will eventually happen, we have not yet mounted them across all nodes.
These are mounted on the login nodes, certain interactive nodes, and dedicated group nodes.
TODO: make this actually happen.
Triton was renewed in 2016 and late 2018.
All info in the triton user guide
Common problems
Graphical User Interface on Aalto CS Linux desktop is sluggish, unstable or does not start
Check your disk quota from terminal with command
quota
. If you are not able to log in to GUI, you can change to text console with CTRL+ALT+F1 key combo and log in from there. GUI login can be found with key combo CTRL+ALT+F7.If you are running low on quota (blocks count is close quota), you should clean up some files and then reboot the workstation to try GUI login again.
You can find out what is consuming quota from terminal with command:
bash -c 'cd && du -sch .[!.]\* \* \|sort -h'
Enter password to unlock your login keyring
You should change your Aalto password in your main Aalto workstation. If you change the password through e.g. https://password.aalto.fi, then your workstation’s password manager (keyring) does not know the new password and requests you to input the old Aalto password.
If you remember your old password, try this:
Start application Passwords and Keys (“seahorse”)
Click the “Login” folder under “Passwords” with right mouse button and select “Change password”
Type in your old password to the opening dialog
Input your current Aalto password to the “new password” dialog
Reboot the workstation / laptop
If changing password didn’t help, then try this:
Then instead of selecting the “change password” from the menu behind right mouse key select “delete” and reboot the workstation. When logging in, the keyring application should use your logging key automatically.
In linux some process is stuck and freezez the whole session
You can kill a certain (own) process via text console.
How do I use eJournals, Netmot and other Aalto library services from home?
There is a weblogin possibility at Aalto Library. After this, all library provided services are available. There are links for journals (nelli) and netmot. Or use VPN which should already be configured.
Rsync complains about Quota, even though there is plenty left.
The reason usually is that default rsync -av
tries to preserve the
group. Thus, there is wrong group in the target. Try using
rsync -rlptDxvz --chmod=Dg+s <source> <target>
. This will make group
setting correct on /scratch/
etc and quota should then be fine.
Quota exceeded or unable to write files to project / work / scratch / archive
Most likely this is due to wrong Linux filesystem permissions. Quota is set per group (e.g. braindata) and by default file go to the default group (domain users). If this happens under some project, scratch etc directory it will complain about “Disk quota exceeded”.
In general this is fixed by admins by setting the directory permissions such that all goes ok automatically. But sometimes this breaks down. Some programs often are responsible for this (rsync, tar for instance).
There are two easy ways to fix this
In terminal, run the command
find . -type d -exec chmod g+rwxs {} \;
under your project directory. After this all should be working normally again.If it’s on scratch or work, see the Triton quotas page
Contact NBE-IT and we will reset the directory permissions for the given directory
I cannot start Firefox
There are two reasons for this.
1. Your network home disk is full
# Go to your user dir
cd ~/..
# Check disk usage
du -sh *
The sum should be less than the max quota which is 100GB (as of 2020). If your disk is full then delete something or move it to a local directory, /l/
.
2. Something went wrong with your browser profile
If you get an error like “The application did not identify itself”, following might solve the issue.
Open terminal,
firefox -P -no-remote
This will launch Firefox and ask you to choose a profile. Note that when you delete a profile you delete passwords, bookmarks and etc. So it’s better to create a new profile, migrate bookmarks and delete the old one.
Aalto Mac
This page describes the Aalto centrally-managed Mac computers, where login is via Aalto accounts. If you have a standalone laptop (one which does not use your Aalto account), some of this may be relevant, but for the most part you are on your own and you will access your data and Aalto resources via Remote Access.
More instructions: https://inside.aalto.fi/display/ITServices/Mac
Basics
In the Aalto installations, login is via Aalto account only.
When you get a computer, ask to be made primary user (this should be default, but it’s always good to confirm). This will allow you to manage the computer and install software.
The first time you login, you must be on an Aalto network (wired or
aalto
wifi) so that the laptop can communicate with Aalto servers and get your login information. After this point, you don’t need to be on the Aalto network anymore.Login is via your Aalto account. The password stays synced when you connect from an Aalto netowrk.
Full disk encryption
This must be enabled per-user, using FileVault. You should always do this, there is no downside. On Aalto-managed laptops, install “Enable FileVault disk encryption” (it’s a custom Aalto thing that does it for you). To do this manually, “Settings → Privacy → enable File Vault.”
Data
You can mount Aalto filesystems by using SMB. Go to Finder → File or
Go (depending on OS) → Connect
to Server → enter the smb://
URL from the data storage pages.
You can find more information at For generic ways of accessing, see Remote Access. For Aalto data storage locations see Filesystem details, and for the big picture of where and how to store data see Science-IT department data principles.
The program AaltoFileSync
is pre-installed and can be used to
synchronize files. But you basically have to set it up yourself.
Software
.dmg files
If you are the primary user, in the Software Center you can install
the program “Get temporary admin rights”. This will allow you to become an
administrator for 30 minutes at a time. Then, you can install .dmg
files yourself. This is the recommended way of installing .dmg
files.
Aalto software
There is an application called “Managed software center” pre-installed (or “Managed software update” in older versions). You can use this to install a wide variety of ready-packaged software. (ITS instructions).
Homebrew
Homebrew is a handy package manager on Macs. On Aalto Macs, you have to install Brew in your home dir. Once you install brew, you can easily instnall whatever you may need.
First install Xcode through Managed Software Centre (either search Xcode, or navigate through Categories -> Productivity -> Xcode).
# Go to wherever you want to have your Brew and run this
mkdir Homebrew && curl -L https://github.com/Homebrew/brew/tarball/master | tar xz -C Homebrew --strip 1
# This is a MUST!!!
echo "export PATH=\$PATH:$(pwd)/Homebrew/bin" >> ~/.zprofile
# Reload the profile
source ~/.zprofile
# Check if brew is correctly installed.
which brew # /Users/username/Homebrew/bin/brew
Older versions of MacOS (pre Mojave) use bash as the default shell, therefore you need to setup the environment differently:
echo "export PATH=\$PATH:$(pwd)/Homebrew/bin" >> ~/.bash_profile
# Reload the profile
source ~/.bash_profile
Admin rights
The “Get temporary admin rights” program described under .dmg file installation above lets you get some admin rights - but not full sudo and all.
You don’t need full admin rights to install brew.
If you need sudo rights, you need a workstation admin (wa) account. Contact your department admin for details.
CS Mac backup service
The CS department provides a full clone-backup service for Aalto-installation mac computers. Aalto-installation means the OS is installed from Aalto repository.
We use Apple Time Machine. Backup is wireless, encrypted, automatic, periodic and can be used even outside the campus using the Aalto VPN. It is “clone” because we can restore your environment in its entirety. You can think of it as a snapshot backup(though it isn’t). We provide twice the space of your SSD; your Mac has 250GB of space, you get 500GB of backup space. If you would like to enroll in the program please pay a visit to our office, T-talo A243.
Encryption
We provide two options for encryption:
You set your own encryption key and only you know it. The key is neither recoverable nor resettable. You lose it, you lose your backup.
We set it on behalf of you and only we know it.
Restore
With Time Machine you have two options for restore.
Partial
You can restore file-by-file. Watch the video,
Complete restore
In case your Mac is broken, you can restore completely on a new Mac. For this, you must visit us.
Trouble-shooting
Can’t find the backup destination
This happens because either 1). you changed your Aalto password or 2). the server is down. Debug in the following manner,
# Is the server alive?
ping timemachine.cs.aalto.fi
# If alive, probably it's your keychain.
# Watch the video below.
# If dead, something's wrong with the server.
# Pease contact CS-IT.
Corrupted backup

This is an unfortunate situation with an unknown reason. We take a snapshot of your backup. Please contact CS-IT.
Common problems
Insane CPU rampage by UserEventAgent
It is a mysterious bug which Apple hasn’t solved yet. We can reinstall your system for you.
Aalto Windows
This page describes the Aalto centrally-managed Windows computers, where login is via Aalto accounts. If you have a standalone laptop (login not using Aalto account), some of this may be relevant, but for the most part you will access your data and Aalto resources via Remote Access.
More instructions: https://inside.aalto.fi/display/ITServices/Windows
Basics
In the Aalto installations, login is via Aalto account only.
You must be on the Aalto network the first time you connect.
Full disk encryption
Aalto Windows laptops come with this by default, tied to your login password. To verify encryption, find “BitLocker” from the start menu and check that it is on.
Note, that on standalone installations, you can do encryption by searching “TrueCrypt” in programs - it is already included.
Data
This section details built-in ways of accessing data storage locations. For generic ways of accessing remotely, see Remote Access. For Aalto data storage locations, see Filesystem details and Science-IT department data principles.
Your home directory is automatically synced to some degree.
You can store local data at
C:\LocalUserData\User-data\<yourusername>
. Note that this is not
backed up or supported. For data you want to exist in a few years,
use a network drive. It can be worth making a working copy here,
since it can be faster.
Software
Aalto software
There is a Windows software self-service portal which can be used to install some software automatically.
Installing other software
To install most other software, you need to apply for a workstation
admin (wa
) account. Contact your department IT to get the process
started.
Common problems
CodeRefinery
The NeIC sponsored CodeRefinery project is being hosted in Otaniemi from (previously we had one in Otaniemi from 12-14 December). We highly recommend this workshop. (note: It is full and registration is closed).
If you have an Aalto centrally-managed laptop, this page gives hints on software installation. You have to use these instructions along with the CodeRefinery instructions.
Note
These are only for the Aalto centrally managed laptops. They are not needed if you have your own computer you administer yourself, or if you have an Aalto standalone computer you administer yourself).
Warning
You should request primary user rights early, or else it won’t be ready on time and you will have trouble installing things. For Windows computers, request a wa (workstation admin) account.
Linux
You need to be primary user in order to install your own packages.
Ask your IT support to make you if you aren’t already. You can check
with the groups
command (see if you are in
COMPUTERNAME-primaryuser
).
Install the required packages this way. If you are primary user, you will be asked to enter your own password:
pkcon install bash git git-gui gitk git-cola meld gfortran gcc g++ build-essential snakemake sphinx-doc python3-pytest python3-pep8
For Python, we strongly recommend using Anaconda to get the latest versions of software and to have things set up mostly automatically.
You should install Anaconda to your home directory like normal (this
is the best way to get the latest versions of the Python packages).
If your
default shell is zsh
(this is the Aalto default, unless you changed
it yourself), then Anaconda won’t be automatically put into the path.
Either: copy the relevant lines from .bashrc
to .zshrc
(you may
have to make this file), or just start bash
before starting the
Anaconda programs.
Jupyter: use via Anaconda.
PyCharm: the “snap package” installer requires root, which most people
don’t have. Instead, download the standalone community file
(.tar.gz
), unpack it, and then just run it using
./pycharm.../bin/pycharm.sh
. The custom script in /usr/loca/bin
won’t work since you aren’t root, but you can make an alias in
.bashrc
or .zshrc
: alias pycharm=...
(path here).
Docker: you can’t easily do this on the Aalto laptops, but it is optional.
Mac
You also need to be primary user to install software.
If you are the primary user, in the Software Center you can install
“Get temporary admin rights”. This will allow you to become an
administrator for 30 minutes at a time. Then, you can install
.dmg
files yourself (Use this for git, meld, cmake, docker).
Anaconda: you should be able to do “Install for me only”.
Xcode can be installed via the Software Center.
Jupyter: use it via Anaconda, no need to install.
Windows
You should request a workstation-admin account (”wa account
”),
then you can install everything. Note: these instructions are not
extensively tested.
Git and bash can be installed according to the instructions.
Visual diff tools: Needs wa-account.
Mingw: Not working, but seems to be because of download failing.
Cmake: Needs wa-account.
Docker: untested, likely requires wa-account.
CS Linux
CS Linux is an OS used for computers not supported by Aalto Linux. It is maintained by the CS Department IT and is currently only available for researchers in the CS department. The OS is intended for setups which the Aalto Linux setup is not flexible enough (mainly custom built setups) for. The Aalto Linux setup is recommended if it serves your needs.
Currently only desktop setups are available.
Basics
Home directory. CS Linux computers have a local home directory (instead of the Aalto home directory found in Aalto Linux).
Aalto credentials are used for login. Anyone in the CS department is able to login to any computer on-site. However, ssh login has to be enabled manually by CS-IT.
The systems are centrally managed with the help of Puppet.
CS Linux computers operate on a dedicated VLAN (different from Aalto Linux). The ethernet port used must be configured before using the computer. The login will not work if the computer is connected to the wrong VLAN. Changes to port configurations can be requested from CS-IT.
The default user interface for CS Linux is GNOME. If your computer doesn’t have a graphical interface, but you would like it to have one, please contact CS-IT and it can be configured remotely with the help of Puppet.
Requesting a new CS Linux computer
Let CS-IT know who will be using the computer and if they need SSH and sudo access. The primary user receives sudo rights by default.
Let CS-IT know if you would like a graphical interface to be installed.
When you are done with a computer
Let CS-IT know that you are leaving and bring the computer to the CS-IT office or arrange for someone from the IT team to pick it up. CS-IT will perform a secure erase on the hard drive(s). This is important as most of the data is stored locally.
Software
Ubuntu packages
If you are the primary user, you have sudo rights. You can then use apt
to install packages.
The module system
The command module
provides a way to manage various installed
versions of software across many computers. See here for a detailed description on the module system.
Data
Everything is stored locally, meaning that there are no backups. Anyone with physical access to the computer is able to access the data stored on it.
You are able to mount the Aalto home directory as well as the teamwork directories (requires sudo rights). This can be done by “connect to server” in the file browser for easy graphical access, or via the command line to choose the mounting location.
Samba share addresses:
smb://home.org.aalto.fi/$USER
smb://tw-cs.org.aalto.fi/project/$projectname/
- replace $projectname.smb://tw-cs.org.aalto.fi/archive/$archivename/
- replace $archivename.
Mounting an smb share using terminal
sudo mount -t cifs -o username=$USER,cruid=$USER,uid=$(id -u $USER),gid=$(id -g $USER),sec=krb5 //tw-cs.org.aalto.fi/project/ ~/mnt
Note
Notice that Samba mounts don’t include information about file and directory permissions. This means that all files and directories will have the default permissions. This also applies to anything that you create.
User accounts
User accounts on CS Linux are managed via the central configuration management. If you want to grant access to the system for other users, please contact CS-IT. Creating local users manually may cause unexpected issues.
Admin rights
The primary user of the computer receives sudo rights by default.
Sudo rights can also be requested for other users (requires approval from primary user). These requests can be sent to CS-IT.
CS Linux computers are centrally managed, meaning that the centralized management should not be broken. Our support is mostly limited to reinstalling the computer, in cases where sudo rights have been used to change settings.
Remote Access
This page describes remote access solutions. Most of them are provided by Aalto, but there are also instruction for accessing your workstations here. See Aalto Inside for more details.
Linux shell servers
Department servers have project, archive, scratch, etc mounted, so are good to use for research purposes.
CS:
magi.cs.aalto.fi
: Department staff server (no heavy computing, access to workstations and has file systems mounted, use thekinit
command first if project directories are not accessible)NBE:
amor.org.aalto.fi
, same as above.Math:
elliptic.aalto.fi
,illposed.aalto.fi
, same as above (but no project, archive and scratch directories)
-
kosh.aalto.fi
,lyta.aalto.fi
: Aalto, for general login use (no heavy compting)brute.aalto.fi
,force.aalto.fi
: Aalto, for “light computing” (expect them to be overloaded and not that useful). If you are trying to use these for research, you really want to be using Triton instead.viila.aalto.fi
: Staff server (access to workstations and has filesystems mounted, but you need to kinit to access them.) that is kind of outdated and different.
Your home directory is shared on all Aalto shell servers, and that means
.ssh/authorized_keys
as well.You can use any of these to mount things remotely via sshfs. This is easy on Linux, harder but possible on other OSs. You are on your own here. You still need
kinit
at the same time.The CS filesystems project and archive and Triton filesystems scratch and work are mounted on
magi
(andviila.aalto.fi
) (see storage).
For any of these, if you can’t access something, run kinit
!
VPN
To access certain things, you need to be able to connect to the Aalto networks. VPN is one way of doing that. This is easy and automatically set up on Aalto computers.
Main Aalto instructions. Below is some quick reference info.
Generic: OpenConnect/Cisco AnyConnect protocols.
vpn.aalto.fi
,vpn1.aalto.fi
orvpn2.aalto.fi
Aalto Linux: Status bar → Network → VPN Connections → Aalto TLS VPN.
Aalto mac: Dock → Launchpad → Cisco AnyConnect Secure Mobility Client
Aalto windows: Start → Search → AnyConnect
Personal Linux laptops: Use OpenConnect. Configuration on Ubuntu: Networks → Add Connection → Cisco AnyConnect compatible VPN. →
vpn.aalto.fi
. Then connect and use Aalto username/password. Or from command line:openconnect https://vpn.aalto.fi
Personal mac: use Cisco AnyConnect VPN Client
personal windows: use Cisco AnyConnect VPN Client
SSH SOCKS proxy
If you need to access the Aalto networks, but can’t send all of your traffic through the Aalto network, you can use SSH + the SSH built in SOCKS proxy. Only use this on computers that only you control, since the proxy itself doesn’t have authentication.
Connect to an Aalto server using SSH with the -D
option:
$ ssh -D 8080 USERNAME@kosh.aalto.fi
Configure your web browser or other applications to use a SOCKS5 proxy
on localhost:8080
for connections. Remember to revert when done or
else you can’t connect to anything once the SSH tunnel stops (“proxy
refusing connections”).
The web browser extension FoxyProxy Standard (available on many web browsers despite the name) may be useful here, because you can direct only the domains you want through the proxy.
Go to the FoxyProxy options
Configure a proxy with some title (“Aalto 8080” for example), Proxy type SOCKS5, Proxy IP 127.0.0.1 (localhost), port 8080 (or whatever you used in the ssh command, no username or password.
Save and edit patterns
Add a new pattern (“New White”) and use a pattern you would like, for example
*.aalto.fi
, and make sure it’s enabled.Save
Now, in this browser, when you try to access anything at
*.aalto.fi
, it will go through the SOCKS proxy and appear to come
from the computer to which you connected. By digging around in
options or using the extension button, you can direct everything
through a proxy and so on.
This can actually also be used for SSH on linux at least (install the
program netcat-openbsd
):
ssh -o 'ProxyCommand=nc -X 5 -x 127.0.0.1:8123 %h %p' HOSTNAME
Remote mounting of network filesystems
See also
Accessing your Linux workstation / Triton remotely
Remote access to desktop workstations is available via the university staff shell servers
viila.aalto.fi
or department-specific serversmagi.cs.aalto.fi
(CS),amor.org.aalto.fi
(NBE),elliptic.aalto.fi
/illposed.aalto.fi
(Math).You need to be the PrimaryUser of the desktop in order to ssh to it.
Remote access to Triton is available from any Aalto shell server:
viila
,kosh.aalto.fi
, etc.When connecting from outside Aalto, you have to use both SSH keys and a password, or use the VPN.
See SSH for generic SSH instructions.
SSHing directly to computers using openssh ProxyJump:
Put this in your .ssh/config file under the proper Host line:
ProxyJump viila.aalto.fi
(or for older SSH clients,ProxyCommand ssh viila.aalto.fi -W %h:%p
).Note that unless your local username matches your Aalto username, or unless you have defined the username for
viila.org.aalto.fi
elsewhere in the SSH config, you will have to use the formataaltousername@viila.org.aalto.fi
instead.
Remote desktop
Aalto has remote desktops available at https://vdi.aalto.fi and http://mfavdi.aalto.fi/. This works from any network.
There are both Windows and Linux desktops available. They are
arranged as virtual machines with the normal desktop installations, so
have access to all the important filesystems and all /m/{dept}/...
.
Aalto Gitlab
https://version.aalto.fi is a Gitlab installation for the Aalto
community. Gitlab is a git
server and hosting facility (an open
source Github, basically).
Note
This page is about https://version.aalto.fi, the Aalto gitlab installation.
scicomp/git contains our pointers for Git usage in general.
Git migration contains information on switching from subversion or other git repositories to Gitlab.
Git in general
Git seems to have become the most popular and supported version control system, even if it does have some rough corners. See the general git page on this site for pointers.
Aalto Gitlab service
Aalto has a self-hosted Gitlab installation at https://version.aalto.fi, which has replaced most department-specific Gitlabs. With Aalto Gitlab, you can:
Have unlimited private repositories
Have whatever groups you need
Get local support
The Aalto instructions can be found here, and general gitlab help here.
All support is provided by Aalto ITS. Since all data is stored within Aalto and is managed by Aalto, this is suitable for materials up to the “confidential” level.
Extra instructions for Aalto Gitlab
Always login with HAKA wherever you see the button. To use your Aalto
account otherwise, use username@aalto.fi
and your Aalto password
(for example, use this with https
pushing and pulling). But, you
really should try to configure ssh keys for pushing and pulling.
For outside/public sharing read-only, you can make repositories public.
If you need to share with an outside collaborator, this is supported. These outside partners can access repositories shared with them, but not make new ones. They will get a special gitlab username/password, and should use that with the normal gitlab login boxes. To request an collaborator account, their Aalto sponsor should go here to the request form (employees only, more info). (You can always set a repository as public, so anyone can clone. Another hackish method is to add ssh deploy keys (read-only or read-write) for outside collaborators, but this wouldn’t be recommended for serious cases.)
For public projects where you want to build a community, you can also consider Github. There’s nothing wrong with having both sites for your group, just make sure people know about both. Gitlab can have public projects, and Github can also have group organizations.
NOTE! If your work contract type changes (e.g. staff -> visitor, student->employee, different department), the Aalto Version blocks the access as a “security” measure. Please contact Aalto ITS Servicedesk <servicedesk@aalto.fi> to unblock you. This is annoying, but can’t be fixed yet.
The service doesn’t have quotas right now, but has limited resources and we expect everyone to use disk space responsibly. If you use too much space, you will be contacted. Just do your best to use the service well, and the admins will work with you to get your work done.
CodeRefinery Gitlab and Gitlab CI service
CodeRefinery is a publically funded project (by Nordforsk / Nordic e-Infrastructure Collaboration) which provides teaching and a GitLab platform for Nordic researchers. This is functionally the same as the Aalto Gitlab and may be more useful if you have cross-university collaboration, but requires more activation to set up.
They also have a Gitlab CI (continuous integration) service which can be used for automated building and testing. This is also free for Nordic researchers, and can be used even with Aalto Gitlab. Check their repository site info, if info isn’t there yet, then mail their support asking about it.
Recommendations
version.aalto.fi
is a great resource for research groups. Research
groups should create a “Gitlab group” and give all their members access to
it. This way, code and important data will last longer than single
person’s time at Aalto. Add everyone as a member to this group so
that everyone can easily find code.
Think about the long term. Will you need access to this code in 5 years, and if so what will you do?
If you are a research group, put your code in a Gitlab group. The users can constantly switch, but the code will stay with the group.
If you are an individual, plan on needing a different location once you leave Aalto. If your code can become group code, include it in the group repository so at least someone will keep it at Aalto.
Zenodo is a long-term data archive. When you publish projects, consider archiving your code there. (It has integration with Github, which you might prefer to use if you are actually making your code open.) Your code is then citeable with a DOI.
In all cases, if multiple people are working on something, think about licenses at the beginning. If you don’t, you may be blocked from using your own work.
FAQ
What password should I use? It is best to use HAKA to log in to gitlab, in which case you don’t need a separate gitlab password. To push, it is best to use ssh keys.
My account is blocked! That’s not a question, but Gitlab blocks users when your Aalto unit changes. This is unfortunately part of gitlab and hasn’t been worked around yet. Mail servicedesk@aalto.fi with your username and request “my version.aalto.fi username XXX be unblocked (because my aalto unit changed)” and they should do it.
What happens when I leave, can I still access my stuff? Aalto can only support it’s community, so your projects should be owned by a group which you can continue collaborating after you leave (note that this is a major reason for group-based access control!). Email servicedesk for information on what to do to become an external collaborator.
When are accounts/data deleted? In 2017, the deletion policy was findable in the privacy policy. In 2017, it was 6 months after Aalto account closed, 24 months after last login, or 12 months after last login of an external collaborator. Now, that link is dead and only links to the general IT Services privacy notice
Are there continuous integration (CI) services available? Not from Aalto (though you can run your own runners), but the CodeRefinery project has free CI services to Nordics, see their site and the description above.
Harbor: Container registry for images and artifacts
Aalto University provides an instance of popular Harbor registry for storing and managing images and other artifacts. Service can be found at https://harbor.cs.aalto.fi.
Web login
Currently only Aalto users can login into the service. When you visit https://harbor.cs.aalto.fi you can choose between OIDC provider and local DB. Choose OIDC provider. It will take you to Microsoft sign-in page for Aalto University.
Projects
New projects can only be created by CS-IT (guru at cs dot aalto.fi).
Each project has project administrators who manages it, and members.
Each new member must be added to project individually. Adding existing Aalto unix groups isn’t currently possible without special request and extra work (due to a limitation of the Aalto Azure directory). If a group is very helpful to your work, ask.
Trivy vulnerability scanner by Aqua Security is available for all projects. You can see security vulnerabilities on each image page.
Docker access
Never use your Aalto password from the docker command line - push is via a token.
Before first time accessing registry you must install
docker-credential-helpers
and configure docker to use your local credential
store.
To install docker-credential-helpers on Aalto Linux run:
pkcon install golang-docker-credential-helpers
Then add following to ~/.docker/config.json
:
{
"credsStore": "secretservice"
}
Now when you login to registry using docker the token is stored to your credential store.
Login to Harbor using docker doesn’t happen with your Aalto password, but instead you need to get a CLI secret from the Harbor web app. You can find your secret by clicking your email address on right corner and select user profile from dropdown. Last in the user profile dialog is CLI secret that you can copy by clicking the icon next to the field. You can also generate new secret or upload your own secret.
Now run:
docker login https://harbor.cs.aalto.fi
For username enter the username show in the user profile dialog, and for the password use the CLI secret from the same dialog.
Tag the image first before pushing (images must be prefixed with
harbor.cs.aalto.fi
):
docker tag <source_image>[:<tag>] harbor.cs.aalto.fi/<project>/<repository>[:<tag>]
To push an image to a project use:
docker push harbor.cs.aalto.fi/<project>/<repository>[:<tag>]
You can find the project specific tag and push commands from the repositories page of the project. Similarly the pull commands for individual artifacts can be found in the artifacts page of their repository.
Robot accounts
Harbor supports robot accounts for projects. They can be create from the robot accounts page of project. Each robot account can have different set of permissions. Each robot account should have minimal permissions needed for their use case. After creating the robot account Harbor generates a secret for it. This secret used to login to the account in same way as with normal accounts. If you forget the secret you refresh it to new one later.
Security
Aalto’s Harbor is officially security rated for public data. Still, if you set permissions right it should only be available to those with permissions (unless it’s set to public).
JupyterHub (jupyter.cs)
Note
This page is about the JupyterHub for light use and teaching, https://jupyter.cs.aalto.fi. The Triton JupyterHub for research is documented at JupyterHub on Triton.
NBGrader in JupyterLab is the default (2023 Autumn)
JupyterLab interface is now available and is the default option for new course servers. Doing Assignments in JupyterLab tells more about using it. You can access nbgrader from the JupyterLab menu.
https://jupyter.cs.aalto.fi is a JupyterHub installation for teaching and light usage. Anyone at Aalto may use this for generic light computing needs, teachers may create courses with assignments using nbgrader. Jupyter has a rich ecosystem of tools for modern computing.
Basic usage
Log in with any valid Aalto account. Our environment may be used for light computing and programming by anyone.
Your persistent storage has a quota of 1GB. Your data belongs to you, may be accessed from outside, and currently is planned to last no more than one year from last login. You are limited to several CPUs and 1GB memory.
Your notebook server is stopped after 60 minutes of idle time, or 8 hours max time. Please close the Jupyter tab if you are not using it, or else it may still appear as active.
There are some general use computing environments. You will began with
Jupyter in the /notebooks
directory, which is your persistent
storage. Your server is completely re-created each time it restarts.
Everything in your home directory is re-created, only /notebooks
is preserved. (Certain files like .gitconfig
are preserved by
linking into /notebooks/.home/...
.)
You begin with a computing server with the usual scipy stack installed, plus a lot of other software used in courses here.
You may access your data as a network drive by SMB mounting it on your own computer - see Accessing JupyterHub (jupyter.cs) data. This allows you total control over your data.
JupyterHub has no GPUs, but you can check out the instructions for using the Paniikki GPUs with the JupyterHub data. These instructions are still under development.
Each notebook server is basically a Linux container primarily running a Juptyer notebook server. You may create Jupyter notebooks to interact with code in notebooks. To access a Linux bash shell, create a new terminal - this is a great place to learn something new.
Accessing JupyterHub (jupyter.cs) data
Unlike many JupyterHub deployments, your data is yours and have many different ways to access it. Thus, we don’t just have jupyter.cs, but a whole constellation of ways to access and do your work, depending on what suits you best for each part.
Your data (and as an instructor, your course’s data) can be accessed many ways:
On jupyter.cs.
Via network drive on your own computer as local files.
On Aalto shell servers (such as kosh.aalto.fi).
On other department/university workstations.
On Paniikki and Aalto computers
On Paniikki, and the Aalto servers kosh.aalto.fi
,
lyta.aalto.fi
, brute.aalto.fi
, and force.aalto.fi
(and
possibly more), the JupyterHub is available automatically. You can,
for example, use the Paniikki GPUs.
Data is available within the paths /m/jhnas/jupyter
. The path on
Linux servers is also available on the hub, if you want to write
portable files.
Name |
Path on hub |
Path on Linux servers |
---|---|---|
personal notebooks |
|
|
course data |
|
|
course instructor files |
|
|
shared data |
|
|
Variable seen above |
Meaning |
---|---|
|
Your Aalto username |
|
The two numbers you see in |
|
The short name of your course. |
You can change directly to your notebook directory by using cd
/m/jhnas/jupyter/${HOME%/unix}
.
You can link it to your home directory so that it’s easily
available. In a terminal, run /m/jhnas/jupyter/u/makedir.sh
and you
will automatically get a link from ~/jupyter
in your home
directory to your user data.
Permission denied? Run kinit
in the shell - this authenticates
yourself to the Aalto server and is required for secure access. If
you log in with ssh keys, you may need to do this.
Remote access via network drive
Basic info
Name |
Network drive path |
---|---|
personal notebooks |
|
course data |
|
course instructor files |
|
shared data |
|
You can do a SMB mount, which makes the data available as a network drive. You will have the same copy of the data as on the hub - actually, same data, so edits immediately take effect on both places, just like your home directory. You must be on an Aalto network, which for students practically means you must be connected to the Aalto VPN (see vpn instructions) or use an Aalto computer. The “aalto” wifi network does not work unless you have an Aalto computer.
Linux: use “Connect to Server” from the file browser. The path is
smb://jhnas.org.aalto.fi/$username
. You may need to useAALTO\username
as your username. If there is separate “domain” option, useAALTO
for domain and just your username for the username.Mac: same path as Linux above, “Connect to Server”. Use
AALTO\your_username
as the username.Windows:
\\jhnas.org.aalto.fi\$username
, and use usernameAALTO\your_username
. Windows sometimes caches the username/password for a long time, so if it does not work try rebooting.
You can also access course data and shared data by using
jhnas.org.aalto.fi/course/
or jhnas.org.aalto.fi/shareddata/
.
See also
Mounting network drives in Windows is the same instructions, but for Aalto home directories. Anything there should apply here, too.
Using GPUs
One problem with our JupyterHub so far is that we don’t have GPUs available. But, because our data is available to other computers, you can use the Paniikki: Computer Lab For Students GPUs (quite good ones) to get all the power you need. To do this, you just need to access the Jupyter data on these classroom computers.
Terminal: First, start a terminal. You can navigate to your data
following the instructions above: cd
/m/jhnas/jupyter/${HOME%/unix}
. From there, navigate to the right
directories and do what is needed.
File browser: Navigate to the path
/m/jhnas/jupyter/u/$nn/$username
, where $nn
is the two numbers
you see when you do echo $HOME
in a terminal. To open a terminal
from a location, right click and select “Open in Terminal”.
Now that you have the terminal and the data, you can do whatever you want with it. Presumably, you will start Jupyter here - but first you want to make the right software available. If you course tells you how to do that using an Anaconda environment, go ahead and do it. (Please don’t go installing large amounts of software like anaconda in the Jupyter data directories - they are for notebooks and small-medium data.)
Using the built-in anaconda, you can load the Python modules with
module load anaconda
and start Jupyter with jupyter notebook
:

Note that now, you need to module load anaconda
, not
anaconda3 like the image shows.
Terms of use
This service must be used according to the general IT usage policy of Aalto university (including no unlawful purposes). It should only be used for academic purposes (but note that self-exploration and programming for own interests is considered an academic purpose, though commercial purposes is not allowed). For more information, see the Aalto policies. Heavy non-interactive computational use is not allowed (basically, don’t script stuff to run in the background when you are not around. If you are using this service is person, it is OK). For research computing, see Triton cluster.
Courses and assignments
New: nbgrader in JupyterLab instructions: Doing Assignments in JupyterLab
Some courses may use the nbgrader system to give and grade assignments. These courses have special entries in the list. If you are a student in such a course, you will have a special environment for that course. Your instructor may customize the environment, or it may one of our generic environments.
If your course is using Jupyter with nbgrader, there are some built-in features
for dealing with assignments. Under the Assignment list tab, you
can see the assignments for your course (only the course you selected
when starting your notebook server). You can fetch assignments to
work on them - they are then copied to your personal /notebooks
directory. You can edit the assignments there - fill out the
solutions and validate them. Once you are done, you can submit them
from the same assignment list. We have a short tutorial that
walks through this process using the new
JupyterLab interface.
A course may give you access to a /coursedata
folder with any
course-specific data.
By default, everyone may access every course’s environment and fetch their assignments. We don’t stop you from submitting assignments to courses you are not enrolled in - but please don’t submit assignments unless you are registered, because the instructors must then deal with it. Some courses may restrict who can launch their notebook servers: if you can not see or launch the notebook server for a course you are registered for, please contact your instructor in this case.
Note that the /notebooks
folder is shared across all of your
courses/servers, but the assignment list is specific to the course
you have started for your current session. Thus, you should pay
attention to what you launch. Remember to clean up your data
sometimes.
Instructors
JupyterHub (jupyter.cs) for instructors
See also
Main article with general usage instructions: Jupyterhub for Teaching. For research purposes, see Triton JupyterHub.
Jupyter is an open-source web-based system for interactive computing in “notebooks”, highly known for its features and ease of use. Nbgrader (“notebook grader”) is a Jupyter extension to support automatic grading via notebooks. The primary advantage (and drawback) is its simplicity: there is very little difference between the notebook format for research work and automatic grading. This lowers the barrier to creating assignments and means that the interface students (and you) learn is directly applicable to (research) projects that may come later.
Nbgrader documentation is at https://nbgrader.readthedocs.io/, and is necessary reading to understand how to use it. For a quickstart in the notebook format, see the highlights page. However, the Noteable service documentation (https://noteable.edina.ac.uk/documentation/) is generally much better, and most of it is applicable to here as well. The information included in these is not duplicated here, and is required in order to use jupyter.cs.
Below, you mostly find documentation specific to jupyter.cs and important notes you do not find other places.
jupyter.cs news
Spring 2024
We had a user’s group meeting. You can find the slides here, including commentary.
Scicomp garage now has a focus day for jupyter.cs on Wednesdays.
Autumn 2023
JupyterLab is now available and is the default for new course servers. If you’d like to continue using Jupyter Notebook for your courses, let us know when requesting a new course. JupyterLab now supports everything nbgrader needs, though the user interface is slightly different. You can send Doing Assignments in JupyterLab to your students for instructions.
Summer/Autumn 2020
You can now make a direct link that will spawn a notebook server, for example for a course with a slug of
testcourse
:`https://jupyter.cs.aalto.fi/hub/spawn?profile=testcourse
If the user is already running a server, it will not switch to the new course. Expect some subtle confusion with this. Full info in FAQ and hints.
Basics
The JupyterHub installation provides a way to provide a notebook-based computational environment to students. It is best to not think of this service as a way to do assignments, but as a general light computing environment that is designed to be easy enough to be used for courses. Thus, students should feel empowered to do their own computing and this should feel like a stepping stone to using their own systems set up for scientific computing. Students’ own data is persistent as they go through courses, and need to learn how to manage it themselves. Jupyter works best for project/report type workflows, not lesson/exercise workflows but of course it can do that too. In particular, there is no real possibility for real-time grading and so on.
Optionally, you may use nbgrader (notebook grader to make assignments, submit them to students, collect them, autograde them, manually grade, and then export a csv/database of grades. From that point, it is up to you to manage everything. There is currently no integration with any other system, except that Aalto accounts are used to login.
What does this mean? Jupyter is not a learning management system (even when coupled with nbgrader), it’s “a way to make computational narratives”. This means that this is not a point and click solution to running courses, but a base to build computations on. In order to build a course, you need to be prepared to do your own scripting and connections using the terminal.
You may find the Noteable documentation (serves as a nbgrader user guide) and book Teaching and Learning with Jupyter (broad, less useful) helpful.
Currently we support Python the most, but there are other language kernels available for Jupyter. For research purposes, see the Triton Jupyter page.
Limits
This is not a captive environment: students may always trivially remove their files and data, and may share notebooks across different courses. See above for the link to isolate-environment with instructions for fixing this.
We don’t have unlimited computational resources, but in practice we have quite a lot. Try to avoid all students doing all the work right before a deadline and you should be fine, even with hundreds of students.
There is no integration to any other learning management systems, such as the CS department A+ (yet). The only unique identifier of students is the Aalto username.
nbgrader
can get you a csv file with these usernames, what happens after that point is up to you.There is currently no plagiarism detection support. You will have to handle this yourself somehow so far.
System environment
The following is the environment in each Jupyter notebook server exists. This is a normal Linux environment, and you are encouraged to use the shell console to interact with it. In fact, you will need to use the console to do various things, and you will probably need to do some scripting.
Why is everything not a push-button solution? Everyone has such unique needs, and we need to solve all of them. We can only accomplish our goals if people are able to - and do - do their own scripting.
Linux container
Each time you launch your server, you get a personal Linux container. Everything (except the data) gets reset each time it stops. From the user perspective, it looks like a normal Linux container. Unlike some setups, we allow students to acknowledge and browse the whole Linux system. (other systems try to hide it, but in reality they can’t stop students from accessing it).
Data
/notebooks/
is your per-user area. It’s what you see by default, and is shared among all your courses./course/
is the course directory (a nbgrader concept). It is available only to instructors. You need to read the nbgrader instructions to understand how this works./coursedata/
is an optional shared course data directory. Instructors can put files here so that students can access them without having to copy data over and over. Instructors can write here, students can only read. Remeber to make it readable to all students:chmod -R a+rX /coursedata
./srv/nbgrader/exchange
is the exchange directory, a nbgrader concept but you generally don’t have to worry about it yourself.
Data is available from outside JupyterHub: it is hosted on an Aalto-wide server provided by Aalto. Thus, you can access it on your laptops, on Aalto public shell servers, and more. A fast summary is below, but see Accessing JupyterHub (jupyter.cs) data for the main info.
From your own laptop: The SMB server
jhnas.org.aalto.fi
path/vol/jupyter/{course,$username}
.Linux: “Connect to server” from the file browser, URL
smb://jhnas.org.aalto.fi/vol/jupyter
Mac: same as Linux
Windows:
\\jhnas.org.aalto.fi\vol\jupyter
.
Data is available on public Aalto shell servers such as
kosh
andlyta
, at/m/jhnas/jupyter/
.
Software
For Python, software is distributed through conda. You can install your own packages using
pip
or conda
, but everything is reset when you restart the
server. This is sort of by design: a person can’t permanently break
their own environment (restarting gets you to a good state), but you
have your own flexibility.
You should ask us to install common software which you are your students need, instead of installing it yourself each time. But you should feel free to install it yourself to get your work done until you do that.
Jupyter
Both Jupyter Lab and classic notebooks are installed, along with a lot of extensions. If you need more extensions, let us know. All courses use only the classic notebook interface by default, because the nbgrader web extensions do not work from Lab.
Requesting a course
Note
JupyterLab interface is now available and is the default option for new course servers. If you’d still like to use the Jupyter Notebook interface for your course, let us know.
To get started with a course, please read the below list and describe your needs from the relevant items, and contact guru@cs.aalto.fi. Don’t worry too much about understanding or answering everything perfectly, just let us know what you want to accomplish and we will guide you to what you need.
Course or not?
If all you need is a Python environment to do assignments and projects, you don’t need to request anything special - students can just use the generic servers for their independent computational needs. Students can upload and download any files they need. You could add data to the “shareddata” location, which is available to any user.
You would want a course environment if you want to (distribute assignments to students via the interface) and/or (collect assignments via the interface).
Request template
To make things faster and more complete, copy and paste the below in your email to us (guru@cs.aalto.fi), and edit all of fields (and if anything unclear, don’t worry: send it and a human will figure it out), and send it to us with any other comments. The format is YAML, by the way (but we can handle the syntax details).
name: CS-E0000 Course Name (Year)
uid: (leave blank, we fill in)
gid: (leave blank, we fill in)
# supervisor = faculty in charge of course
# contacts = primary TAs which should also get emails from us.
# manager = (optional) has rights to add other TAs via
# domesti.cs.aalto.fi (supervisor is always a manager)
# Please separately tell us who the initial TAs are! Managers can
# add more later via domesti.cs.aalto.fi.
supervisor: teacher.in.charge@aalto.fi
contact: [teacher.in.charge@aalto.fi, head.ta@aalto.fi]
#manager: [can_add.tas@aalto.fi]
# if true, create a separate data directory
datadir: false
# Important dates. But not too important, we can always adjust later.
# So far, you need to email us to make it public when you are ready!
public_date: 2020-09-08 # becomes visible to students before course
private_date: 2021-01-31 # hidden from students after course
archive_date: 2021-09-01 # becomes hidden from instructors
delete_date: 2021-09-01 # after this, we ask if it can be deleted
# For the course dates itself (just for our reference, not too important)
start_date: 2020-10-01
end_date: 2020-12-15
course_times: EXAMPLE, fill in: Exercise sessions Tuesday afternoons, Deadlines Fridays at 18
# The dates above actually aren't used. These control visibility:
private: true
archive: false
# Internal use, ignore this. The date is the version of software
# you get (your course won't get surprise updates to software after
# that date).
image: [standard, 2020-01-05]
Course environment options
When requesting a course, please read the following and tell us your requirements in the course request email, guru@cs.aalto.fi (using the template above). If you are using the hub without a specific course item in the selection list, please let us know at least 3a, 6, 7, and 8 below. You don’t need to duplicate stuff in the YAML above.
Required metadata is:
|
Permanent identifier of course, of the form |
|
What students see in the interface |
|
Who to ask about day-to-day matters, could be multiple. Aalto emails or usernames. |
3a. Who should be added to the “announcement” issue and gets announcements about updates during the periods. |
|
|
Long-term staff who can answer questions about old data even if the course TAs move on. Might be same as contact. This is the “primary owner” of all data according to the Science-IT data policy. |
|
Who will have access to the instructor data? Instructors will
be added to a Aalto unix group named |
|
Just to keep track of expected load and so on. |
|
Sessions when all students will be using it (e.g. lectures, tutorials). Deadlines when you expect many students will be working. Will be added to our hub calendar, to avoid doing maintenance when at critical moments. Please do whatever you can to de-peak loads, but in reality we can probably handle whatever you throw at as. Very late night deadlines are usually not good since we often do maintenance then (and are bad for students…). |
|
What kind of assignments? Lots of CPU, memory intensive? Knowing how people use the resources helps us to make things work well. |
|
What periods is the course? Note: these aren’t automatically used yet, you may still have to mail us to make it private or not. |
9a. Public date - course automatically becomes public on this date (until then, students can’t see it). |
|
9b. Hide date - course automatically goes back to private mode on this date. (it’s fine and recommended to give a long buffer here). |
|
9c. Archive date - course goes into “archive” mode after this time, gets hidden from instructors, too. |
|
9a. Delete date - data removed. Not automatic, contacts will get an email to confirm (we aren’t crazy). |
A course environment consists of (comment on any specifics here):
A course directory
/course
, available only to instructors. This comes by default, with a quota of a few gigabytes (combined with coursedata). Note: instructors should manage assignments and so on using git or some other version control system, because the course directory lasts only one year, and is renewed for the next year.Software (optional, recommended to use the default and add what you need) A list of required software, or a docker container containing the Jupyter stack and additional software. By default, we have an image based on the scipy stack and all the latest software that anyone else has requested, as long as it is mutually compatible. You can request additional software, and this is shared among all courses. If you need something special, you may be asked to take our image and extend it yourself. Large version updates to the image are done twice a year during holidays.
(optional) A sample python file or notebook to test that the environment works for your course (which will be made public and open source). We also use use automated testing on our software images, so that we can be sure that our server images still work when they are updated. If you send us a file, either
.py
or.ipynb
, we will add this to our automatic tests. The minimum amount is something likeimport
of the packages you need, a more advanced thing would test the libraries a little bit - do a minimal, quick calculation.
Computational resources (optional, not recommended) A list of computational resources per image. Default is currently 2GB and 4 processors (oversubscribed). Note that because this is a container, only the memory of the actual Python processes are needed, not the rest of the OS, and memory tends to be quite small.
Shared data directories. If you have nontrivial data which needs distributing, consider one of these shared directories which saves it from being copied over and over. The notebook directory itself can only support files of up to 2MB to prevent possible problems. If number of students times amount of data is more than a few hundred MB, strongly consider one of the data directories. Read more about this below.
You can use the “shareddata” directory
/mnt/jupyter/shareddata
.shareddata
is available in all notebooks on jupyter.cs.aalto.fi (even outside of your course) and also (eventually) other Aalto servers. This data should be considered public (and have a valid license), even though for now it’s only accessible to Aalto accounts./coursedata
is only available within your course’s environment (as chosen from the list).coursedata
is also assumed to be public to everyone at Aalto, though you have more control over it.If you use either of these, you can embed the paths directly in your notebooks. This is easy for hub use, but makes it harder to copy the notebooks out of the hub to use on your own computers. This is something we are working on.
Also tell us if you want to join the jupyterhub-courses group to share knowledge about making notebooks for teaching.
Course data
See also
One of the best features of jupyter.cs is powerful data access. See Accessing JupyterHub (jupyter.cs) data
If your course uses data, request a coursedata
or shareddata
directory as mentioned above. You need to add the data there
yourself, either through the Jupyter interface or SMB mounting of
data.
If you use coursedata
, just start the course environment and
instructors should have permissions to put files in there. Please try
to keep things organized!
If you use shareddata
, ask for permission to put data there - we
need to make the directory for you. When asking, tell us the
(computer readable short)name of the dataset. In the shareddata
directory, you find a README file with some more instructions. All
datasets should have a minimum README (copy the template) which makes
it minimally usable for others.
In both cases, you need to chmod -R a+rX
the data directory
whenever new files or directories are added so that the data becomes
readable to students.
Note: after you are added to relevant group to access the data, it make take up to 12 hours for your account information to be updated so that it can be accessed via remote mounting.
Don’t include large amount of data in the assignment directories - there will be at least four, if not more, copies of data made for every student.
Data from other courses
Sometimes, when you are in course A’s environment, you want to access the data from course B. For example, A is the next year’s edition of the course B, and it could be useful to check the old files.
You can access the files for every course which you are an instructor
of at the path /m/jhnas/jupyter/course/
. The files/
sub-directory is the entire course directory for that course, the same
as /course/
in each course image. You can also access the course
data directory at data/
there.
All old courses (for which you are listed as an instructor) are available, but if the course is in the “achived” state, you can’t modify the files.
Nbgrader basics
Note
We have prepared a tutorial for students on how to fetch/submit assignments: Doing Assignments in JupyterLab. Feel free to share it with them/link to it in MyCourses pages.
“nbgrader is a tool that facilitates creating and grading assignments in the Jupyter notebook. It allows instructors to easily create notebook-based assignments that include both coding exercises and written free-responses. nbgrader then also provides a streamlined interface for quickly grading completed assignments.” - nbgrader upstream documentation
Currently you should read the upstream nbgrader documentation, which we don’t repeat. You might also find the Noteable services’ nbgrader documentation useful. We have some custom Aalto modifications (also submitted upstream) which are:
How to use nbgrader
Read the nbgrader docs! We can’t explain everything again here.
The course directory is /course/
. Within this are source/
,
release/
, submitted/
, autograded/
, and feedback/
.
Things which don’t (necessarily) work in nbgrader
Autograde: if you click the thing, it will work, but is the same as running all your students code on your own computer with no security whatsoever. A slightly clever student is able to see other students work (a privacy breach), alter their grades.
Feedback: While it appears to work, it is designed to operate by hashing the contents of the notebook. Thus, if you have to edit the notebook to make it execute, the hash will be different and the built-in feedback distribution will not work.
Furthermore, don’t expect hidden tests to stay hidden, grading to happen actually automatically, things to be fully automatic, and so on. Do expect a computing environment optimized for learning.
These are just intrinsic to how nbgrader works. We’d hope to fix these sometime, but it will require a more coordinated development effort.
Aalto specifics
Instructors can share responsibilities, multiple instructors can use the exchange to release/collect files, autograde, etc. Note that with this power comes responsibility - try hard to keep things organized.
We can have the assignments in
/notebooks
while providing whole-filesystem access (so that students can also access/coursedata
).We’ve added some extra security and sharing measures (most of these are contributed straight to nbgrader).
Join the shared course repository to share knowledge with others
To use nbgrader:
Request a course as above.
Read the nbgrader user instructions.
You can use the
Formgrader
tab at the top to manage the whole nbgrader process (this automatically appears for instructors). This is the easiest way, because it will automatically set up the course directory, create assignment directories, etc. But, you can use thenbgrader
command line, too. It is especially useful for autograding.It’s good to know how we arrange the course directory anyway, especially if you want to manage things yourself without Formgrader. The “course directory” (nbgrader term) is
/course
. The original assignments go in/course/source
. The other directories are/course/{nbgrader_step}
and, for the most part, are automatically managed.New assignments should be in
/course/source
. Also don’t use+
in the assignment filename (nbgrader #928).Manage your assignments with
git
. See below for some hints about how to do this.If you ever get permission denied errors, let us know. nbgrader does not support multiple instructors editing the same files that well, but we have tried to patch it in order to do this. We may still have missed some things here.
Version control of course assignments
See also
Shared jupyterhub-courses version.aalto.fi Gitlab organization to share notebooks and knowledge about running JupyterHub courses.
git is a version control system which lets
you track file versions, examine history, and share. We assume you
have basic knowledge of git, and here we will give practical tips to
use git to manage a course’s files. Our vision is that you should use
nbgrader to manage the normal course files, not the students
submissions. Thus, to set up the next year’s course, you just clone
the existing git repository to the new /course
directory. You
backup the entire old course directory to maintain the old students
work. Of course, there are other options, too.
Create a new git repository in your /course/
directory and do some
basic setup:
cd /course/
git init
git config core.sharedRepository group
You should make a .gitignore
file excluding some common things
(TODO: maybe more is needed):
gradebook.db
release/
submitted/
autograded/
feedback/
.nbgrader.log
.ipynb-checkpoints
The git repository is in /course
, but the main subdirectory of
interest is the source/
directory, which has the original files,
along with whatever other course notes/management files you may have
which are under /course
. Everything else is auto-generated.
Autograding
Warning
nbgrader autograde
is not secure, because arbitrary student
code is run with instructor permissions. Read more from the
instructor page.
Testing a course
Often, people ask “how can I test the assignments if I use nbgrader”? There are different options.
Test as an instructor
The instructor functions don’t overlap with the student functions: you don’t need some special way to test the student experience.
As an instructor, you can release assignments, then go to the student view, fetch, do, submit, etc. This is the same experience as students would get, and really is the full experience (there is not much else to test). You and your TAs can test this way - and of course you can add others just for the purpose of testing it this way.
Of course, you can add TAs just for the purpose of testing it like this, and this would be recommended (as long as nothing is secret is the course directory at the time you are doing these tests - remember to remove them later). You can do this yourself using the group management service we send you (domesti.cs).
An instructor also has an option in the server list to spawn as a
student. This hides the /course
directory and makes the
environment identical to that of a student (but it shouldn’t matter
much).
Send assignments to testers yourself
Before all this fancy Jupyter interface, nbgrader was very simple: send assignments around manually. For example, they would post assignments on the course website, people would submit via the course site, and they would be downloaded and unpacked into the right places in the course directory. This is still probably the best way to test things out.
Steps:
To send an assignment to someone: download the generated release version from
/course/release/$assignment_id/$name.ipynb
.Send (e.g. email) to someone. They send it back to you when done. They can do the assignment on their own computer, or upload to jupyter.cs to do it (the “general use” server works fine).
To receive the assignment, put it back in the course dir as
/course/submitted/$STUDENT_NAME/$assignment_id/$name.ipynb
.$STUDENT_NAME
is invented by you, but the others should match.
That is all: now you can autograde and all, completely normally. This is all that the web interface does anyway.
When you are done testing, you can delete these $STUDENT_NAME
directories. There is also some command to delete them from the
database if you want, or more likely you might remove the whole
gradebook.db
to make sure you start fresh.
The shell access (and other data access, see System environment) makes it easy to manage these files, copy them in and out, and so on.
Add student testers while in private mode
While your course is still in private mode, you can add dedicated student testers. This might be useful before the course becomes public.
While this works, we don’t recommend it unless you really need a lot of testers. It is manual work to set up, and manual work to remove. And likely we are going to forget to clean it up later.
Just like above, you may need to clean up these test students.
Send us a list of Aalto emails or usernames to add.
Request another course
In principle, you could request a whole other jupyter.cs course, just for testing, and we could add private students there. But this would be a lot of work for us (and some for you, when you need to transfer files over - but if you use git that part won’t be that bad).
In general, we don’t do this - one of the above options should work for you. Even if you do this, you likely have to combine with some of the above tasks (requesting us to add students while in private mode).
Nbgrader on your own computer
You can always install nbgrader yourself, on your own computer, to test out how it works. Probably this is not for everyone, but is effective to test things out.
nbgrader hints
These are practical hints on using nbgrader for grades and assignments. You should also see the separate autograding hints page if you use that.
General
To export grades, nbgrader export
is your central point. It will
generate a CSV file (using a custom MyCourses exporter), which you can
download, check, and upload to MyCourses. You can add the option
--MyCoursesExportPlugin.scale_to_100=False
to not scale points to 100.
If students submit assignments/you use autograding
See also
In each notebook (or at least the assignment zero), in the top, have a
STUDENT_NUMBER = xxx
which they have to fill in. Asking each student to include the student number in a notebook ensures that you can later write a script to capture it.
Testing releasing assignments, without students seeing
Sometimes instructors want to release and collect assignments as a test, while the course is running. To understand how the solution is simpler than “make a new course”, we need to understand what “release” and “collect” do: they just move files around. So, you can just move them to a different place (called the exchange) instead of the one that all students see. Nbgrader docs sure doesn’t do a good job of explaining it, but behind the scenes it’s quite simple, and that simplicity means it’s easy to control if you know what you are up to…
You can equally move your test files around to a test, instructor-only
exchange for your own testing. (Actually, this isn’t even needed, you
can just copy them directly, test, and put back in the submitted/
directory. But some people want more. So, from the jupyter terminal,
we have made these extra aliases:
# Release to test exchange (as instructor):
nbgrader-instructor-exchange release_assignment $assignment_id
# Fetch from test exchange (as instructor, pretending to be a student):
nbgrader-instructor-exchange fetch_assignment $assignment_id
# Submit to test exchange (as instructor, pretending to be a student):
nbgrader-instructor-exchange submit $assignment_id
# Collect to test exchange (as instructor):
nbgrader-instructor-exchange collect $assignment_id
This copies files to and from /course/test-instructor-exchange/
,
which you can examine and fully control. If you are doing this, you
probably need that control anyway. These terms match the normal
nbgrader terminology.
There’s no easy way to make a switch between “live exchange” and “instructor exchange” in the web interface, but because of the power of the command line, we can easily do it anyway.
(use type -a nbgrader-instructor-exchange
to see just what it does.)
Known problems
The built-in feedback functionality doesn’t work if you modify the submitted notebooks (for example, to make them run). nbgrader upstream limitation. Contact us and we can run a script that will release the feedback to your students.
Course data
If you use the /coursedata
directory and want the notebook to be
usable outside of JupyterHub too, try this pattern:
import os
if 'AALTO_JUPYTERHUB' in os.environ:
DATA = '/coursedata'
else:
DATA = 'put_path_here'
# when loading data, always os.path.join(DATA, 'the_file.py')
This way, the file can be easily modified to load data from somewhere else. Of course, many variations are possible.
Converting usernames to emails
JupyterHub has no access to emails or student numbers. If you do need to link to email addresses, you can do the following. (Note: the format USERNAME@aalto.fi works for MyCourses upload, this process is not usually needed these days anymore.)
ssh to kosh.aalto.fi
cd to wherever you have exported a csv file with your grades (for example your course directory,
cd /m/jhnas/jupyter/course/$course_slug/files/
).Run
/m/jhnas/jupyter/software/bin/username-to-email.py exported_grades.csv
- this will add an email column right after the username column. If the username column is not the zeroth (counting from zero), use the-c $N
option to tell it that the usernames are in theN
th column (zero indexed).Save the output somewhere, for example you could redirect it using
>
to a new filename. A full example:/m/jhnas/jupyter/software/bin/username-to-email.py mycourses_export.csv > mycourses_usernames.csv
This script is also available on github.
Our scripts and resources
Some scripts at https://github.com/AaltoSciComp/jupyter-wiki .
We are soon going to revise all of our instructor info which can be useful to you later.
Autograding
Autograding is sometimes seen as the “holy grail” of using Jupyter for teaching. But you need an appreciation of the level of the task at hand and how to do it.
Autograding
Warning
Running nbgrader autograde
is not secure, because arbitrary
student code is run with instructor permissions, including access
to all instructor files and all other student data. We have
designed our own system to make it secure, but we must run it for
you. Contact us to use it. If you autograde yourself, you are
making a choice to risk privacy of all students (probably violating
Finnish law) and the integrity of your grades. This is a
long-standing design flaw of nbgrader which we have
fixed as best we can.
The secure autograder has to be run manually, by us. Fetch your assignments and contact us in good time.
How deep do you go?
Normal Jupyter notebooks, no automation. You might use our JupyterHub to distribute assignments and as a way for students to avoid running their own software, but that’s all.
Use nbgrader facilities to generate a student version of assignments, but handle grading yourself (“manually using nbgrader” or via some other system).
Full autograding.
You may think “autograding will save me effort”. It may, but it will make a whole lot of effort in another way: making your assignment robust to autograding. As someone once said: plan for one day to write an assignment, one week to make it autogradeable, then weeks to make it robust. It doesn’t help that most reference material you can find is about basic programming, not about advanced data science projects.
If you use autograding, you have to test your notebooks with many students of different levels. Plan on weeks for this.
What is autograding?
nbgrader is not a fancy thing - it just copies files around. Autograding is only running the whole notebook from top to bottom and looking for errors. If there are errors, subtract points. There is not some major platform running in the background that does things actually automatically. This is also the primary benefit: a simple system allows your notebooks to be more portable and reusable, and match more closely to real work.
Autograding at Aalto
Design your notebook well
Collect your notebooks using the nbgrader interface. Don’t click any “autograde” buttons (unless you check the notebook yourself first).
Send an email to guru asking specifying your course and assignment and ask for autograding. We will run actually secure autograding on our server soon, and send you a report on what worked or didn’t. Everything gets automatically updated in your environment.
Proceed as normal, for example…:
If autograding didn’t work for some people, you can check them, modify if needed, and re-run the autograding yourself (since you just checked it).
Designing notebooks for autograding
(please contribute or comment on these ideas)
Check out the upstream autograding hints, which include: hints on writing good test cases, checking if a certain function has been used, checking how certain functions were called, grading plots, and more. But when reading this, not how these examples are simple code - your cases will probably be more complex.
Understand the whole loop of transferring files from you, to student versions, to students, and back. Understand what the loop is not as well. Understand that there isn’t actual automatic autograding.
Have an assignment zero with no content and worth zero (or one) points, which students have to submit just to show they know how the system works (for example, they don’t forget to push “submit”). Maybe it just has some trivial math or programming exercises. This reduces the cognitive load when doing the real assignments.
Design your notebook with a mindset of unit testing. Note that this isn’t the way that notebooks are usually used, though. Functions and testable functions are good. But note that if you put everything in functions, you lose some of the main benefits of notebooks (interactivity made possible by having things in the top-level scope)! Such is life.
Have sufficient tests that are visible to the students, so that they can tell if their answers are reasonable. For example, student-visible tests might check for the shape of arrays, hidden tests check for the actual values. This also ensures that they are approaching it the way you expect.
Similarly, some instructors have found that you must have plenty of structure so that students only have to fill in well-defined chucks, with instructor code before and after. This ensures that students do “the right thing”, but also means that students lose the experience of the “big picture”: loading, preprocessing, and finalization - important skills for the future. Instead, they learn to fill in blanks and no more, no less. So, in this way autograding is a trade-off: more grade-able, less realistic about the full cycle of work.
Within your tests, use variable names that won’t have a conflict (for
example, a random suffix like testval_randomstring36456165
instead of
testval
). This reduces the chance of one of your tests
conflicting/overwriting something that the students have added.
Expect students to do everything wrong, and fail in weird ways. Your tests need to be robust.
Consider if your assignment is more open-ended, or there is one specific way to solve it. If it’s more open-ended, consider if it is even realistic to make it autogradeable.
nbgrader relies on metadata in order to do the autograding. In order for this to work, the cell metadata needs to be intact. Normally, you can’t even see it for a cell, but it can be affected if: a) cells are copied and pasted to another notebook file (metadata lost, autograding fails), or b) cells are split (metadata duplicated, nbgrader halts then). You should ask students to copy the whole notebook file around when needed. You should also ask the students to generally avoid doing anything weird with the notebook files.
The environment variable NBGRADER_VALIDATING
can be used to tell
if the code is being run in the autograding context.
A notebook shouldn’t do extensive external operations when
autograding, such as downloading data. For that matter, it should try
to minimize these when running on JupyterHub, too (a course with 1000
students doesn’t need every student to download data separately -
that’s a recipe to get us blocked). Request a /coursedata/
directory and you can put any type of data there for students to use.
You can try these kind of conditionals to handle these cases:
# Setup for if on aalto jupyterhub or if we are autograding
if 'AALTO_JUPYTERHUB' in os.environ or 'NBGRADER_VALIDATING' in os.environ:
data_home = '/coursedata/scikit_learn_data/'
# Make sure that it doesn't try to write new data here,
# students won't be able to
else:
data_home = None # use default for a personal computer
Warnings to give to students
Don’t copy and paste cells within a notebook. This will mess up the tracking metadata and prevent autograding from working.
Be cautious about things such as copying the whole notebook to Colab to work on it. This has sometimes resulted in removing all notebook metadata, making autograding impossible.
FAQ
This error message:
[ERROR] One or more notebooks in the assignment use an old version of the nbgrader metadata format. Please **back up your class files directory** and then update the metadata using: nbgrader update .
There are various ways this can happen: perhaps the most common is a student duplicates a cell. There is no solution other than manually fixing the notebook, or grading it yourself. (The error message is confusing and doesn’t make sense, a wide variety of internal problems can cause the same error).
Public copy of assignments
One disadvantage of a powerful system is that we have to limit access to authorized users. But you shouldn’t let this limit access to your course: there is nothing special about our system, and if you allow others to see your assignments, they can run them themselves. For example, the service https://mybinder.org allows anyone to run arbitrary notebooks from git repositories.
This is also important because your course environment will go away after a few months - do you want students to be able to refer to it later? If so, do the below.
change to the
release/
directory andgit init
. Create a new repo here.Manually
git add
the necessary assignment files after they are generated from thesource
directory. Why do we need a new repo? Because you can’t have the instructor solutions/answers made public.Update files (
git commit -a
or some such) occasionally when new versions come out.Add a
requirements.txt
file listing the different packages you need installed for a student to use the notebooks. See the MyBinder instructions for different ways to do this, but a normal Pythonrequirements.txt
file is easiest for most cases. On each line, put in a name of a package from the Python Package Index. There are other formats forR
,conda
, etc, see the page.Then, push this
release/
repo to a public repository (check mybinder for supported locations). Make sure you don’t ever accidentally push the course repository!Then, go to https://mybinder.org/ and use the UI to create a URL for the resources. You can paste this button into your course materials, so that it’s a one-click process to run your assignments.
Note that mybinder has a limit of 100 simultaneous users for a repository, to prevent too much use for single organization’s projects. This shouldn’t be the first place you direct students for day-to-day work.
If you have a
/coursedata
directory, you will have to provide these files some other way. You could put them in the assignment directory and therelease/
git repository, but then you’ll need to have notebooks able to load them from two places:/coursedata
or.
. I’d recommend do this:import os
,if os.path.exists('/coursedata'): DATADIR='/coursedata'
,else: DATADIR='.'
and then access all data files byos.path.join('DATADIR', 'filename.dat')
. This has the added advantage that it’s easy to swap outDATADIR
later, too.
FAQ and hints
Instructions/hints
Request a course when you are sure you will use it. You can use the general use containers for writing notebooks before that point.
Don’t forget about the flexible ways of accessing your course data.
The course directory is stored according to the Science-IT data policy. In short, all data is stored in group directories (for these purposes, the course is a group). The instructor in change is the owner of the group: this does not mean they own all files, but are responsible for granting access and answering questions about what to do with the data in the long term. There can be a deputy who can also grant access.
To add more instructors/TAs, go to domesti.cs.aalto.fi and you can do it yourself. You must be connected to an Aalto network. See the Aalto VPN guide for help with connecting to an Aalto network from outside.
Store your course data in a git repository (or some other version control system) and push it to version.aalto.fi or some such system.
git
and relevant tools are all installed in the images.You know that you are linked as an instructor to a course if, when you spawn that course’s environment, you get the
/course
directory.You can now make a direct link that will spawn a notebook server, for example for a course with a slug of
testcourse
:`https://jupyter.cs.aalto.fi/hub/spawn?profile=testcourse
. If the user is already running a server, it will not switch to the new course. Expect some subtle confusion with this and plan for it.We have a test course which you can use as a sandbox for testing nbgrader and courses. No data here is private even after deleted, and data is not guaranteed to be persistent. Use only for testing. Use the general use notebook for writing and sharing your files (using git).
The course environments are not captive: students can install whatever they want. Even if we try to stop them, they can use the general use images (which may get more software at any time) or download and re-upload the notebook files. Either way, autograding is done in the instructors environment, so if you want to limit the software that students can use, this must be done at the autograding stage or via other hacks.
1) If you want to check that students have not used some particular Python modules, have an hidden test that they haven’t used the module, like:
'tensorflow' not in sys.modules
.2) autograde in an environment which does not have these extra packages. Really, #2 is the only true solution. See the information under https://github.com/AaltoSciComp/isolate-namespace for information on doing this.
In all cases, it is good practice to pre-import all modules the students are expected to be able to use and tell students that other modules should not be imported.
Students should use you, not us, as the first point of contact for problems in the system. Please announce this to students. Forward relevant problems to us.
You can access your course data via SMB mounting at the URLs
smb://jhnas.org.aalto.fi/course/$courseslug/files/
and the course data usingsmb://jhnas.org.aalto.fi/course/$courseslug/data/
(with Windows, use\\
instead of/
and don’t includesmb://
). This can be very nice for managing files. This may mess up group-writeability permissions. It will take up to half a day to be able to access the course files after your request your course.You are the data controller of any assignments which students submit. We do not access these assignments on your behalf, and a submission of an assignment is an agreement between you and the student.
You should always do random checks of a fair fraction of notebooks, to avoid unexpected problems.
You can tell what image you have using
echo $JUPYTER_IMAGE_SPEC
.A notebook can tell if it is in the hub environment if the
AALTO_JUPYTERHUB
environment variable is set.A notebook can tell if it is being autograded by checking if
NBGRADER_VALIDATING
is set.You can install an identical version of nbgrader as we have using:
pip install git+https://github.com/AaltoSciComp/nbgrader@live
This may be useful if you get metadata mismatch errors between your system and ours. There used to be more differences, these days the differences are minimal because most of our important changes have been accepted upstream.
You can get an
environment.yml
file of currently installed packages using:conda env export -n base --no-builds
But note this is everything installed: you should remove everything from this file except what your assignments actually depend on, since being less strict will increase the chances that it’s reproduceable.
nbgrader
should be removed (it pins to an unreleased development version which isn’t available), and perhaps theprefix
should too. For actual versions installed, seebase
andstandard
dockerfiles in the singleuser-image repo.
FAQ
Something with nbgrader is giving an error in the web browser. Try running the equivalent command from the command line. That will usually give you more debugging information, and may tell you what is going wrong.
I see Server not running … Would you like to restart it? This particular error also happens if there are temporary network problems (even a few seconds and it comes back). It doesn’t necessarily mean that your server isn’t running, but there is no way to recover. I always tell people: if you see this message, refresh the page. If the server is still running, it recovers. If it’s actually not running, it will give you the option to restart it again. If there are still network problems, you’ll see an error message saying that.
Gurobi Gurobi has license issues, and it’s not clear if it can even be distributed by us. So far, we only support open software.
But, courses have used gurobi before. They had students install themselves, in the anaconda environment, and somehow told it what the Aalto license server was. For examaple, using the magic of “!” shell commands embedded in notebooks, it was something like this, which would automatically install gurobi for students and set the license file information.:
!conda install -c gurobi gurobi !echo [license_file_information] > ~/.[license_file_path]
I have done a test release/fetch/autograde of an assignment, and I want to re-generate it. It says I can’t since there are already grades. You also need to remove it from the database with the following command. Note that if students have already fetched, they will need to re-fetch it so don’t do this if it’s already in the hands of the students - you will only create chaos (see the point below).
$ nbgrader db assignment remove ASSIGNMENT-ID
I have already released an assignment, and now I need to update it and release it again. Some students have already fetched it. This works easily if students haven’t fetched it yet, if they have it requires some manual work from them.
What you need to do: (make sure the old version is git-committed), edit the source/ directory version, un-release the assignment, generate it again, release the assignment again. You might need to force it to fetch the assignment again, if it has already been fetched. (verify, TODO: let me know how you do this)
On the student side: After an assignment is fetched, it won’t present the option to fetch it again (that would lose their work). Instead, they need to move the fetched version to somewhere else, then re-fetch. You can send the following instructions to your students:
I have updated an assignment, and you will need to re-fetch it. You work won’t be lost, but you will need to merge it into the new versions.
First, make sure you save everything and close the notebooks.
Open a terminal in Jupyter
Run the following commands to change to the course assignment directory and move the assignment to a new place (
-old
suffix on the directory name):$ cd /notebooks/COURSE/ $ mv ASSIGMENT_ID ASSIGNMENT_ID-old
In the assignment list, it should now offer you to re-fetch the assignment.
You can now open both the new old old versions (but to open the old version, you need to navigate to
/notebooks/COURSE/ASSIGNMENT_ID-old
yourself to see it).If you have already submitted the assignment, submit again. The old assignment is still submitted, but our fetching should get the new one.
Contact
CS-IT. (students, always contact your course instructors first.)
Chat via scicomp chat, https://scicomp.zulip.cs.aalto.fi, stream
#jupyter.cs
for quick questions (don’t send personal data here, it is public).Issues needing action (new courses, autograding, software installation, etc) via the CS IT email alias guru @ cs dot aalto.fi
Realtime support via Triton, SciComp, RSE, and CS every day at 13:00, focus days on Wednesdays but some help might be possible on other days (good for screensharing to show a problem, you can prepare us by mentioning your issue in the chat first). You can coordinate by chat to be sure.
More info
The Noteable is a commercial service using nbgrader and has some good documentation: https://noteable.edina.ac.uk/documentation/
For source code and reporting issues, see the main jupyterhub page.
See the separate instructors guide. This service may be either used as general light computing for your students, or using nbgrader to release and collect assignments.
Privacy notice
Summary: This system is managed by Aalto CS-IT. We do not store
separate accounts or user data beyond a minimal database of usernames
and technical logs of notebooks which are periodically removed (this
is separate from your data). The actual data (your data, course data)
is controlled by you and the course instructor respectively. We do
not access data, but when necessary for the operation of the system,
but we may see file metadata (stat FILENAME
) such as permissions,
size, timestamp and filename. Your personal data may be deleted once
it has been inactive for one year, and at the latest once your Aalto
home directory is removed (after your Aalto account expires). Course
data is controlled by course instructors.
See the separate privacy policy document for more details.
FAQ and bugs
I started the wrong environment and can’t get back to the course selection list. In JupyterLab, use the menu bar, “Hub->Control Panel”. On the classic notebooks, use the “Control panel” button on the top right. (Emergency backup: you can always change the URL path to
/hub/home
).Is JupyterLab available? Yes, and it’s nice. There are two general use instances that are actually the same, the only difference is one starts JupyterLab by default and one starts classic notebooks by default.
Can I login with a shell? Run a new terminal within the notebook interface.
Can I request more software be installed? Yes, let us know and we will include it if it is easy. We aim to have featureful environments by default, but won’t go so far as to install large specialist software. It should be in standard repositories (conda or pip for Python stuff).
Can I do stuff with my class’s assignments and not have it submitted? You have your personal storage space
/notebooks/
, which you can use for whatever you want. You can always make a copy of the assignment files there and play around with them as much as you want - even after the course is over, of course.Are there other programming languages available? Currently there is Python, R, and Julia. More could be added if there is a good Jupyter kernel for it.
What can I use this for? Intended uses include anything related to courses, own exploration of programming, own data analysis, and so on (see Terms of Use above). Long-term background processing isn’t good (but it’s OK to leave small stuff running, close the tab, and come back).
When using nbgrader, how do I know what assignments I have already submitted? Currently you can’t beyond what is shown there.
Can I know right away what my score is after I submit an assignment with nbgrader? nbgrader is not currently designed for this.
Are there backups of data? Data storage is provided by the Aalto Teamwork system. There are snapshots available in
.snapshot
in every directory (you have tols
this directory in a shell using its full name for it to appear the first time). This service is not designed for long term data storage, and you should back up anything important because it will be lost after about one year or when your Aalto account expires. You should usegit
as your primary backup mechanism, obviously.Is git installed? Yes, and you should use it. Currently you have to configure your username and email each time you use it, because this isn’t persistent (because home directories are not persistent). Git will guide you through doing this. In the future, your Aalto directory name/email will be automatically set. As a workaround, run
git config
without the--global
option in each repository.I don’t see “Assignment list”. You have probably launched the general use server instead of a course server. Stop your server and go spawn the notebook server of your course.
I’m getting an error code Here are the ones we know about:
504 Gateway error: The hub isn’t running in background. This may be hub just restarting or us doing maintenance. If it persists for more than 30 minutes, let someone know.
Stan/pystan/Rstan don’t work. Stan needs to do a memory-intensive compilation when your program is run. We can’t increase our memory limits too much, but we have a workaround: you need to tell your program to use the
clang
compiler instead of thegcc
compiler by setting the environment variablesCC=clang
andCXX=clang++
. For R notebooks, this should be done for you. For RStudio, we don’t know. For Python, put the following in your notebook:import os os.environ['CC'] = "clang" os.environ['CXX'] = "clang++"
We should set this the default, but want to be sure there are no problems first.
RStudio doesn’t appear. It seems that it doesn’t work from the Edge browser. We don’t know why, but try another browser.
I’ve exceeded my quota. You should reduce the space you use, the quota is 1GB. If this isn’t enough and you actually need more for your classes, tell your instructor to contact us. To find large directories files: open a terminal and run
du -h /notebooks/ | sort -h
to find all large files. Then clean up that stuff somehow, for examplerm -r
. Note that.home/.local/share/jupyter/nbgrader_cache
will continue to grow and eventually needs to be cleaned up - after the respective course is done.I don’t see the assignments for my course. There are different profiles you can start, and you can’t tell which profile you have started. Go back to the hub control panel and restart your server. To be more precise, click the “Control Panel” in the upper-right corner, then click “Stop my Server”, wait a little bit, then click “Start My Server” and choose the profile for your course.
More info
Students, your first point of contact for course-related or Jupyter matters and bugs with JuptyerHub should be your instructors, not us. They will answer questions and send the relevant ones to us. But, if you can actively help with other things, feel free to comment via Github repositories below.
The preferred way to send feedback and development requests is via Github issues and pull requests. However, we’re not saying it’s best to give Github all our information, so you can also send tickets to CS-IT.
Students and others who have difficulty in usage outside of a course can contact CS-IT via the guru alias.
Jupyter notebooks are not an end-all solution: for an entertaining look at some problems, see “I don’t like notebooks” by Joel Grus or less humorous pitfalls of Jupyter notebooks. Most of these aren’t actually specific to notebooks and JupyterLab makes some of the problems better, but thinking hard about the downfalls of notebooks makes your work better no matter what you do.
Our source is open and on Github:
single-user image (everything about a user’s environment)
server itself (logging in, course profiles, etc).
Local LLM web APIs
As a pilot service, Aalto RSE has a service running some common open-source LLMS (llama2, mistral, etc.) available via the web. This can be used for lightweight purposes via programming, but shouldn’t replace batch usage (use LLMs) or interactive chatting (use Aalto GPT).
Access
Currently this is not available publicly, but if you ask, we can provide development access. Chat with us in the #llms stream on Chat. That’s also the best way to contact the developers (other contact methods are in Help.
The API doesn’t have it’s own detailed documentation (ask us), but the API should be OpenAI compatible (for chat models) so many existing libraries work automatically.
Intended use and resource availability
This is ideal if you need to run things through LLMs which are only running on Aalto servers, and without many requests per second (this isn’t for batch use). This could, for example, be an alternative to running your own LLM server for basic testing or small question answering. It’s also good if you need to test various open-source LLMs out before beginning batch work. It’s perfectly suited for intermittent daily use.
Right now, each models has limited resources (some running on CPUs and some on GPUs). They can serve a request every few seconds, but resources could easily be overloaded. We intend to add resources as needed, depending on use. For any serious use, please contact us so that we can plan for the future. Don’t assume any stability or performance right now.
Technical implementation
Models run on local hardware on the Aalto University premises. Kubernetes is used to manage computing power, so in principle there is plenty of opportunity for scaling, but this is not turned on until a need is established. CPU resources are significant, but there are limited GPU resources (but that can change, depending on demand).
Standalone Matlab
General matlab hints: http://math.aalto.fi/opetus/Mattie/MattieO/matlab.html
Installation and license activation on staff-owned computers
Matlab academic license permits installation on home computers for university personnel. Triton MDCS workers are available to anyone with a Triton account, which means the workers can be utilized from personal laptops as well.
Download image
Log into http://download.aalto.fi/ with your Aalto account. Look for the link Software for employees’ home computers which will take you to the Matlab download links. Download the UNIX version for Linux and OSX or the separate separate image for Windows.
The ISO image can be burned on a DVD or mounted on a virtual DVD drive.
Windows: Use MagicDisk or Virtual CloneDrive OR burn the image on DVD. Double click on setup.exe icon.
Linux:
# sudo mkdir /mnt/loop # sudo mount -o loop Download/Matlab_R2010b_UNIX.iso /mnt/loop # sudo /mnt/loop/install.sh
Mac OS X: Double click on InstallForMacOSX.app icon.
Installation steps
Select the installer options as shown in the screenshots.
Mathworks account is required to continue with the installation.
Enter your account information in the installer to log in. If the password has been lost, Click on the Forgot your password? option to receive your password in email. OR
Register to Mathworks with the installer.
Click on I need to create an account.
Enter your name and email address. To be recognized as Aalto academic user the email address must end in one of aalto.fi, tkk.fi, hut.fi, hse.fi, hkkk.fi or uiah.fi domains.
The installer will ask for an activation key, which is shown here in the last screenshot.
You may leave out unnecessary toolboxes and change the installation location. Remember however, that the Parallel Computing Toolbox is necessary to run any Matlab batch jobs on Triton.
Install Triton-MDCS integration scripts
Continue MDCS setup from Matlab Distributed Computing Server.
Stand-alone license activation on Aalto Linux laptops
To install Matlab and activate a stand-alone license on your Aalto Linux computer:
Install Matlab using the command:
pkcon install matlab
Run
/opt/matlab2022a/bin/activate_matlab.sh
(replacing 2022a by whatever version you are using)Select “Activate automatically using the Internet” and press Next.
Select the license saying “individual” and press Next.
Enter your Aalto user name as the login name and press Next.
Press Confirm.
Without a stand-alone license, you can only run Matlab if you have an internet connection to the Aalto network, or whatever internet connection and an Aalto VPN connection. With the stand-alone license, you can run Matlab even without an internet connection.
FAQ
Matlab freezes with Out of Memory errors
Q: Matlabs freezes and I get errors like this. What to do?:
Exception in thread "Explorer NavigationContext request queue" java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.mathworks.matlab.api.explorer.FileLocation.<init>(FileLocation.java:89)
at com.mathworks.matlab.api.explorer.FileLocation.getParent(FileLocation.java:126)
... ... ...
A1: Add more memory in Home -> Preferences -> General -> Java Heap memory
A2: Can you free up memory in your code sooner using the clear
command?
https://se.mathworks.com/help/matlab/ref/clear.html
GPU acceleration?
Q: is there functional GPU acceleration? Does the acceleration even work?
A: run code:
>> g = gpuDevice;
>> ng
A2: Just query some feature:
>> fprintf('%s\n', g.ComputeCapability)
a3: Show multiple devices if found:
>> for ii = 1:gpuDeviceCount
g = gpuDevice(ii);
fprintf(1,'Device %i has ComputeCapability %s \n', ...
g.Index,g.ComputeCapability)
end
Open Source at Aalto
Note
This policy was developed at the Department of Computer Science, in conjunction with experts from Research and Innovation services (both the legal and commercialization sides) with the intention of serving the wider community.
After more research, we have learned that this policy is, in fact, de-facto applicable to all of Aalto, it is just extremely unclear that open source is actually allowed. Thus, this policy can be seen as best practices for all of Aalto. However, everyone (including CS) has more rights: one does not have to use this policy. You don’t have to use an open source license. IP ownership may be in more limited hands, so that you need fewer agreements to release.
However, we strongly encourage you to use this policy anyway. If you use this, you know that you are safe and have all permissions to make open source, regardless of your particular funding situation. It also ensures that you make proper open source software, for maximum benefit and open science impact.
References at bottom.
Researchers make at least three primary outputs: publications, software, and data. This policy aims to make openly releasing all types of work as straightforward the traditional academic publishing process.
This document describes the procedure for Aalto employees releasing the output of their work openly (open source software, data, and publications). Aalto University encourages openness. This policy covers only cases where work can clearly be released openly with no bureaucracy needed. It does not cover complex cases, such as commercial software, work related to inventions, complex partnership agreements, etc. The policy is voluntary, and provides a right to release openly, but does not require it or preclude any other university process. (Thus it’s more of a guideline than a policy.) It only is relevant when the creator has an employment relationship with Aalto. If they don’t (e.g. students), they own their own work unless there is some other agreement in place (e.g. their own funding contract, grant, etc). Still, they can use this same process with no extra bureaucracy needed.
We realize that this policy does not cover all cases. We aim to cover the 99% case, and existing processes are used for complicated cases. Aalto Innovation Services provides advice on both commercialization and open source release.
This policy is for public licensing only (one to many). You must go through Research and Innovation Services for anything involving a multi-party agreement.
Why release?
The more people who see and build on our work, the more impact we can have. If this isn’t enough, you get more citations and attention. While we can’t require anything, we strongly encourage that all work is either made open source or taken through the commercialization process. If you don’t know what to do, don’t worry: they are not mutually exclusive. Proper open-source licensing can protect your ability to commercialize later. Talk to Innovation Services. They like open source, too, and will help you find the right balance. Anyway, if work matches the criteria in this policy, it probably has limited commercial potential anyway: what is more important is your own knowledge and skills that went into it.
You want to add a proper open source license to your work, rather than just putting code on some webpage. Without a license, others can not build on your code, making your impact limited. No one will build on you, and eventually your work rots and gets lost.
You always want to go through this process as soon as possible at the beginning of a project: if you don’t, it becomes much harder to track everyone down.
You shouldn’t release as open source (yet) if your work is intentionally commercial or contains patentable inventions. In these cases, contact Innovation Services. In the second case (patentable inventions), according to Finnish legislation you are actually required to report the invention to Innovation Services.
Traps and acting early
Intellectual property rights don’t give you the right to do anything - they give you the right to block others from doing something. Thus, it is very important that you don’t end up in a situation where others can block you, and that means thinking early.
Decide on a license as soon as possible. Once it goes into the repository, future contributors implicitly agree to it. Otherwise, you are stuck trying to find all past contributors and get their agreement.
Another common trap is non-open source friendly grants. Not many outright ban it, but some require permission from all partners, and if there are a lot then this becomes close to impossible. Ask in advance, but in the worst case it might be you just can’t write software at the times you are paid by these projects!
Step-by-step guide for release under this policy
Do these steps at the beginning of your project, not at the end!
Check if the work is covered under the “conditions for limited commercial potential” in the policy.
Choose a proper license to match your needs. See below for information. It must be open source, and you can not transfer any type of exclusive license away - Aalto keeps full right to future use.
Get the consent of all authors and their supervisors and/or funders. There are no particular requirements for this, the only need is proving it later in case a question ever arises. You should also make sure that your particular funding source/collaboration agreements don’t have any further requirements on you. (For example, some grant agreements may say no GPL-type licenses without consent of all partners.) Your advisor (and Research and Innovation Services) can help you with this.
If you are funded by Aalto basic funding, you by default have permission. Same goes for other big public funding agencies (Academy, EU… but the grant can always override this).
If you are in services, follow your source of funding. At the very worst, whoever is responsible for your funding can decide, but it may be someone lower too.
You are responsible for making sure that you have the right to release your code. For example, that there are no other agreements other rights (intellectual property and privacy), legal restrictions, or anything else restricting a release. Also, any other included software must have compatible licenses.
Put a copyright license in the source repository. In the best case, each individual source file should list copyright and authors, but in practice if you don’t do this it’s not too much of a problem. Make sure that the license disclaims any warranty (almost all licenses will do this). After this, contributors implicitly consent to the license. If you have an important case, ask explicitly too. The important thing is that you have more evidence than the amount of scrutiny you might get (low in typical projects, will be higher if your project becomes more important).
This policy is seen as Aalto transferring the rights to you to release, not Aalto releasing itself (just the same as with publications). Release in your own name, but you can(+should) list your affiliations.
Make your code public if/when you want. No particular requirements here, but see below for best practices.
Any borderline or questionable cases should be handled by the existing innovation disclosure process.
In addition to the above requirements, the following are best practices:
You can’t require that people cite you, but you can ask nicely. Make it easy to do this! Include the proper citations directly in the README. Make your code itself also citeable by publishing it somewhere (Github, Zenodo, …).
Put on a good hosting location and encourage contributions. For example, Github is the most popular these days, but there are plenty of others. Welcome contributions and bug reports, and build on them. Make yourself the hub of expertise of your knowledge and methods.
Choosing a license
Under this policy, any Creative Commons, Open Source Initiative, and Free Software Foundation approved open source licenses are usable. However, you should not try to be creative, and use the most common license that serves your needs.
Top-level recommendations:
Use this nice site: https://choosealicense.com/. It contains everything you need to know, including what is here. If you need something more specific you can have a look at http://oss-watch.ac.uk/apps/licdiff/.
MIT for software which should be basically public domain, Apache 2.0 for larger almost-public domain things (the Apache license protects more against patent trolling). Anyone can use this for any purpose, including putting it in their own proprietary, non-open products.
GNU General Public License (GPL) (“v2 or any later version”) for software which you may try to commercialize in the future. This license says that others can not make it closed-source without your consent. Others can use it for commercial purposes, but all derivative work must also be made open source - so you keep an advantage.
For special cases:
Lesser GNU General Public License (LGPL, GPL with classpath exception) type licenses. Suitable where the GPL would be appropriate but the software is a library. It can be embedded within other proprietary products, but the code itself must stay open.
The Affero GPL/LGPL. These get around the “webservice loophole”: if your code is available via a webservice, the code running it must stay open.
CC-BY for other non-software output.
Discussion:
Most public domain → MIT / Apache 2 > CC-BY > LGPL > GPL > AGPL → Most protection against proprietary use
If you think you might want to commercialize in the future: ask innovation services and they’ll help you release as open source now and preserve commercialization possibilities for the future.
The policy
Open Source Policy
Covered work
Software
Publications and other writing (Note that especially in this case, it is common to sign away full rights. This is a case where you do more than this policy says.)
Data
Conditions for limited commercial potential
This policy supports the release of work with limited commercial potential. Work with commercial potential should be assessed via Aalto’s innovation process.
If work’s entire novelty is equally contained in academic publications, there is usually little commercial value. Examples: code implementing algorithms, data handling scripts.
Similarly, work which only is a byproduct of academic publications or other work probably has limited commercial value, unless some other factor overrides. For example: analysis codes, blog posts, datasets, other communications.
Small products with limited independent value. If the time required to reproduce the work is small (one week or less), there is likely not commercial value. For example: sysadmin scripts, analysis codes, etc. Think about the time for someone else to reproduce the work given what you are publishing, not the time it took for you to create it.
Should a work be contributing to an existing open project, there is probably little commercial value. For example: contribution to existing open-source software, Wikipedia edits, etc.
NOT INCLUDED: Should work contain patentable elements or have commercial potential, this policy does not apply and it should be evaluated according to the Aalto innovation process. Patentable discoveries are anything which is a truly new, non-obvious, useful inventions. In case of doubt, always contact Innovation Services! Indicators for this category: actually novel, non-obvious, useful, and actually an invention. Algorithms and math usually do not count, but expressions of these can.
NOT INCLUDED: Software designed for mass-market consumption or business-to-business use should be evaluated according to the Aalto innovation process. Indicators for this category: large amount of effort, software being a primary output.
Ownership of intellectual property rights at Aalto
This policy covers work of employees whose contracts assign copyright and other intellectual property rights of their work to Aalto. However, the Aalto rules for ownership of IP are extremely difficult, so see the last point.
Your rights are assigned to Aalto if you are funded by external funding, or if there are other Aalto agreements regarding your work.
If neither of the points in (2) apply to you AND your work is independent (self-decided and self-directed), then according to Finnish law you own all rights to your own work. You may release it how you please, and the rest of this policy does NOT apply (but we recommend reading it anyway for valuable advice). Aalto Innovation Services can serve you anyway.
Rather than figure out the the ownership of work, this policy is written to apply to all work, so that you do not need to worry about this.
Release criteria and process
This policy applies to copyright only, not other forms of intellectual property. Should a work contain other intellectual property (which would not be published academically), this policy does not apply. In particular, this policy does not cover any work which contains patentable inventions.
The employee and supervisor must consider commercial potential. The guidelines in the “conditions for limited commercial potential” may guide you. Should there be commercial potential, go through the existing innovation disclosure processes. In particular, any work which may cover patentable inventions must be reported first.
If all conditions are satisfied, you, in consultation with your PI, supervisor, or project leader (whichever is applicable) and any funder/client requirements, may choose to release the work. Should the supervisor or PI have a conflict of interest or possible conflict of interest, their supervisor should also be consulted.
Depending on funding sources, you may have more restrictions on licensing and releasing as open source. Project proposals and grant agreements may contain provisions relevant to releasing work openly. When making project proposals, consider these topics already. When in doubt, contact the relevant staff.
To be covered under this policy, work must be licensed under a open/open source/free software license. In case of doubt, Creative Commons, Open Source Initiative, and Free Software Foundation approved open source licenses are considered acceptable. See below for some license recommendations.
All warranty must be disclaimed. The easiest way of doing this is by choosing an appropriate license. Practically all of them disclaim warranty.
All authors must consent to the release terms.
The employee should not transfer an exclusive license or ownership to a third party. Aalto maintains the right to relicense and use internally, commercially, or re-license should circumstances change.
Employees should acknowledge their Aalto affiliation, if this possible and within the community norms.
This right should not be considered Aalto officially releasing any work, but allowing the creators to release it in their own name. Thus, Aalto does not assume liability or responsibility for work released in this way. Copyright owner/releaser should be listed as the actual authors.
Employees are responsible for ensuring that they have the right to license their work as open source, for example ensuring that all included software and data is compatible with this license and that they have permission of all authors. Also the release must be allowed by any relevant project agreements. Should you have any doubts or concern, contact Innovation Services.
To apply this to your work, first receive any necessary permissions. In writing, by email, is sufficient. Apply the license in your name, but list Aalto University as an affiliation somewhere that makes sense. Do not claim any special Aalto approval for your work.
For clarity, raw official text is separate from the guidance on this page. Current approvals: Department of Computer Science (2017-03-17).
How to run a good open-source software project
One of the largest benefits to open source is having a community of people contributing back to you. To do this, you need to have a good environment. Open development, good style and a basic contribution guide, and encouragement is the base of this. Eventually, this section may contain some more pointers to how to create this type of community. (TODO)
References
CSC open source policy, with similar practical effects to what we have here.
Aalto Research Data Management Policy: https://inside.aalto.fi/download/attachments/43223812/2016_02_10_datapolicy.pdf?version=1&modificationDate=1455967763618&api=v2
Aalto IP guide: FI EN: contains evidence that this policy is applicable to all Aalto.
Aalto Innovation Services: https://innovation.aalto.fi/
Choosing an open source license: https://choosealicense.com/
Aalto copyright advice: http://copyright.aalto.fi/
Practical guidelines for Open Source Projects: forthcoming, 2017
Overleaf
Aalto provides an professional» site license to all the community. For more information, see https://www.overleaf.com/edu/aalto.
In order to link yourself to Aalto, you must register for and have an OrcID [wikipedia]. An OrdID (“Open Researcher and Contributor ID”) is some permanent ID which is used for linking researchers to their work, for example, some journals require linking to an OrcID. OrdID can be accessed directly with your Aalto account.
TODO: determine exact procedure and update here
Aalto rates overleaf as for “public” data. This doesn’t mean that Overleaf makes your data public, but just that Aalto can’t promise security. In reality, you decide if Overleaf is secure enough. If there is some legal requirement for security, you probably shouldn’t use Overleaf. If there is a collaborator requirement for security, then you must make your own choice if Overleaf is suitable.
Paniikki: Computer Lab For Students
Paniikki is a cutting edge computer lab in the computer science department. It is located in T-building C106 (right under lecture hall T1). This documentation is a Paniikki cheatsheet.

< The blue box at the entrance is Paniikki >
For more services directed at students, see Welcome, students!.
The name
Paniikki means “panic” in English which is a fascinating name as people in panic are in panic. I don’t know which comes first, the space or the emotion. Anyway, people experience the both simultaneously.
Access
Physical
You can access Paniikki in the T-building C106. It is right by the building’s main entrance (you can see it through the windows by the building’s main entrance).
Remote
You can ssh via the normal Aalto shell servers kosh and lyta.
Going through them, you can then ssh to one of the Paniikki computers.
Be warned, there is no guarantee that you get an empty one… if it
seems loaded (use htop
to check), try a different one.
You can find the hostnames of the Paniikki computers on aalto.fi.
Hardware
CPU properties |
Spec |
---|---|
Model |
Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz |
Architecture |
x86_64 |
CPU(s) |
12 |
Thread(s) per core |
2 |
max MHz |
4000.0000 |
Virtualization |
VT-x |
L1d cache |
32K |
L1i cache |
32K |
L2 cache |
256K |
L3 cache |
15360K |
Model |
NVIDIA Quadro P5000 |
---|---|
GPU properties |
Spec |
Core |
GP104GL (Pascal-based) |
Core clock |
1607 MHz |
Memory clock |
1251 MHz |
Memory size |
16384 MiB |
Memory type |
256-bit GDDR5X |
Memory bandwidth |
320 |
CUDA cores |
2560 |
CUDA compute capability |
6.1 |
OpenGL |
4.5 |
OpenCL |
1.2 |
Near GeForce Model |
GeForce GTX 1080 |
Memory properties |
Spec |
---|---|
RAM |
32GiB |
Software
First thing first, you don’t have sudo rights on Aalto classroom machines and you can’t, because they are shared. We provide the most used SW and if you need more you could inquire via servicedesk@aalto.fi. We try to have a good base software that covers most people’s needs.
What? |
How? |
---|---|
Python via Anaconda |
|
Python (system) |
Default available |
Tensorflow |
in the Python environments, e.g. anaconda above |
Modules
In short, module
is a software environment management tool. With
module
you can manage multiple versions of software easily. Here
are some sample commands:
Command |
Description |
---|---|
|
load module |
|
list all modules |
|
search modules |
|
show prerequisite modules to this one |
|
list currently loaded modules |
|
details on a module |
|
details on a module |
|
unload a module |
|
save module collection to this alias (saved in |
|
list all saved collections |
|
details on a collection |
|
load saved module collection (faster than loading individually) |
|
unload all loaded modules (faster than unloading individually) |
There are some modules set up specifically for different courses: if you just load the environment with “module load”, you will have everything you need.
Read the details in Module environment page.
Example 1
Assume we are in Paniikki and wants to do our homework for CS-E4820 Machine Learning: Advanced probabilistic methods. In the course students use Tensorflow and Edward.
# Check available modules
$ module load courses/ # Tab to auto-complete
# Finally you will complete this
$ module load courses/CS-E4820-advanced-probabilistic-methods.lua
# Check the module you loaded
$ module list
Currently Loaded Modules:
1) courses/CS-E4820-advanced-probabilistic-methods
# Check the packages
$ conda list # You will see Tensorflow and etc.
# Launch Jupyter
$ jupyter notebook
# Do your homework
# You are done and want to un-load all the modules?
$ module purge
Example 2: General Python software
Need Python and general software? The anaconda modules have Python, a bunch of useful scientific and data packages, and machine learning libraries.
# Latest Python 3
$ module load anaconda
# Old Python 2
$ module load anaconda2
Example 3: List all software
You can check all other modules as well
$ module avail

< Available modules in Paniikki as of 2018 March 8th >
You want to use Matlab?
$ module load matlab/2017b
$ matlab
Questions?
If you have any question please contact servicedesk@aalto.fi and clearly mention the Paniikki classroom in the message.
Python on Aalto Linux
The scientific python ecosystem is also available on Aalto Linux workstations (desktops), including the anaconda (Python 3) and anaconda2 (Python 2) modules providing the Anaconda python distribution. For a more indepth description see the generic python page under scientific computing docs.
On Aalto Linux Laptops, these instructions don’t work. Instead, we’d recommend installing Anaconda or Miniconda yourself and then you can manage packages via environments. You can also install Python packages through the package manager, but that can have problems with installing your own libraries if not managed carefully.
Anaconda on Aalto Linux
You can mostly use Python like normal - see Python.
To create your own anaconda environments, first load the Anaconda module:
$ module load anaconda
then you get the conda
command. If you get an error such as:
NotWritableError: The current user does not have write permissions to a required path.
path: /m/work/modules/automatic/anaconda/envs/aalto-ubuntu1804-generic/software/anaconda/2020-04-tf2/1b2b24f2/pkgs/cache/18414ddb.json
Try the following to solve it (this prevents conda from trying to store its downloaded files in the shared directory):
$ conda config --prepend pkgs_dirs ~/.conda/pkgs
The “neuroimaging” environment
On the Aalto Linux workstations and Triton, there is a conda environment which contains an extensive collection of Python packages for the analysis of neuroimaging data, such as fMRI, EEG and MEG.
To use it on Aalto Ubuntu workstations and VDI:
$ ml purge
$ ml anaconda3
$ source activate neuroimaging
To use it on Triton:
$ ml purge
$ ml neuroimaging
To see the full list of packages what are installed in the environment, use:
$ conda list
Some highlights include:
Basic scientific stack
numpy
scipy
matplotlib
pandas
statsmodels
fMRI:
nibabel
nilearn
nitime
pysurfer
EEG/MEG:
mne
pysurfer
Machine learning:
scikit-learn
tensorflow
pytorch
R:
rpy2 (bridge between Python and R)
tidyverse
Finally, if you get binaries from the wrong environment (check with
which BINARYNAME
) you may need to update the mappings with:
$ rehash
MNE Analyze
Note: this was tested only for NBE workstations. If you wish to run
mne_analyze
from your workstation you should follow this procedure. Open a
new terminal and make sure you have the bash shell (echo $SHELL
, if you
do not have it, just type bash
) and then:
$ module load mne
$ source /work/modules/Ubuntu/14.04/amd64/common/mne/MNE-2.7.4-3434-Linux-x86_64/bin/mne_setup_sh
$ export SUBJECTS_DIR=PATHTOSUBJECTFOLDER
$ export SUBJECT=SUBJECTID
$ mne_analyze
Please note that the path of the “source” command might change with most up to date versions of the tool. Please note that the “PATHTOSUBJECTFOLDER” and “SUBJECTID” are specific to the data you have. Please refer to MNE documentation for more help on these.
Mayavi
If you experience problems with the 3D visualizations that use Mayavi (for example MNE-Python’s brain plots), you can try forcing the graphics backend to Qt5:
For the Spyder IDE, set Tools -> Preferences -> Ipython console -> Graphics -> Backend: Qt5
For the ipython consoles, append
c.InteractiveShellApp.matplotlib = 'qt5'
to theipython_config.py
andipython_kernel_config.py
configuration files. By default, these can be found in~/.ipython/profile/default/
.In Jupyter notebooks, execute the magic command
%matplotlib qt5
at the beginning of your notebook.
Installation of additional packages
The “neuroimaging” environment aims to provide everything you need for the
analysis of neuroimaging data. If you feel a package is missing that may be
useful for others as well, contact Marijn van Vliet. To quickly install a package in your home folder,
use pip install <package-name> --user
.
Remote Jupyter Notebook on shell servers
See also
We now have a General use student/teaching JupyterHub installation which may serve your uses more simply.
Here we describe how you can utilise Aalto computing resources for Jupyter Notebook remotely. The guide is targeted for UNIX users at the moment.
Aalto provides two “light computing” servers: brute.org.aalto.fi
, force.org.aalto.fi
. We demonstrate how to launch a Jupyter Notebook on brute
and access it on your laptop.

< System activity on Brute >
ssh username@brute.org.aalto.fi
# Create your Kerberos ticket
kinit
# Create a session. I use tmux
tmux
# Load Anaconda
module load anaconda
# Create your env
conda create -n env-name python=3.6 jupyter
# Activate your python environment
source activate env-name
# Launch jupyter notebook in headless mode and a random port number
jupyter notebook --no-browser --port=12520
Note
You might get messages like The port 12520 is already in use, trying another port
while starting the notebook server. In that case, take note of the port the server is running in, e.g.:
[I 15:42:14.187 NotebookApp] The Jupyter Notebook is running at:
[I 15:42:14.187 NotebookApp] http://localhost:12470/?token=kjsahd21n9...
and replace “12520” below with the correct port number, 12470 in this case.
Now back to your laptop
# Forward the port
ssh -L 12520:localhost:12520 -N -f -l username brute.org.aalto.fi
Now launch your browser and go to http://localhost:12520 with your token.
Zulip
See also
Instructors, see the relocated instructor page at Zulip for instructors.
Aalto Scicomp Zulip - researcher and staff discussion
If you are a researcher looking for the ASC chat for help and support, see the chat help section or log in directly at https://scicomp.zulip.cs.aalto.fi .
Zulip is a open-source chat platform, which CS hosts at Aalto. It is used as a chat platform for some courses, and allows better student and chat privacy.
The primary distinguishing feature of Zulip is topics, which allows one to make order out of a huge number of messages. By using topics, you can narrow to a certain thread of conversation while not losing sight of the overall flow of messages.
Zulip for instructors
Introduction
Zulip is an online discussion tool with latex support. It has been used by some Aalto teachers as an external service on individual courses. For spring and summer 2021, Zulip was provided by Aalto CS as a pilot solution for all School of Science departments’ course needs. For the autumn 2021 and spring 2022, the pilot at SCI continues and is widened in small scale also for other schools. The pilot refers to a) a fixed-term project with clear lifecycle needs, like in courses which start and end at certain times and after which the Zulip instance can be deleted; b) a transitional period between current state and possible production use or change to other solutions; and c) a basic solution with without all the fancy features or user interface. During the pilot users are expected to provide feedback, which will effect on the decision-making for future solutions, and the development of usability.
CS-IT hosts Zulip the chat instances for you. These
chat instances are hosted at <chat-name>.zulip.aalto.fi
(or older instances at <chat-name>.zulip.cs.aalto.fi
). Login to the
chats is available with Aalto accounts. Email registration for external users
is also possible via invitations. After logging in for the first time with an
Aalto account, if no matching Zulip account was found, you are prompted to
“Register” and create one. Once the Zulip account has been created, it should
be linked to your Aalto credentials.
Internal or confidential matters should not be discussed on the platform.
Get started / request Zulip
Note
Chat realms can be requested using the form at https://zulip.aalto.fi/requests/.
Note
If you encounter issues, report them to CS-IT or on #zulip-support at scicomp.zulip.cs.aalto.fi
You can also give/discuss feedback, complaints or suggestions on #zulip-feedback at scicomp.zulip.cs.aalto.fi
Note
You can test out Zulip at testrealm.zulip.cs.aalto.fi. Use the Aalto login. This chat is for testing only.
After you have received the chat instance
Within few days of requesting an instance, you should have gotten details for your chat instance in email. After this you
Can login to the chat instance <chat-instance>.zulip.cs.aalto.fi with your Aalto account
Should already have the owner role assigned.
Can configure the chat instance from (cog wheel in the top-right corner) -> Manage organization
Please carefully read the Configuration section before making changes
Can appoint more admins/owners (e.g. TAs)
Ask them to login first
Change their role from Manage organization -> Users
Configuring your organization
Below are listed the most important settings found under Manage organization in Zulip. There is no easy way for us to enforce these, so it is your responsibility as organization owner or admin to make sure they are set correctly. Make sure any owners/admins you appoint are aware of these as well.
Note
Settings that are not mentioned here, you can configure to your liking. However you should still exercise care, since you are responsible for the service and safety of your user’s data. If you would like advice, please ask us.
Organization settings / Video chat provider
Set to
None
The default provider (Jitsi) has not been evaluated or approved by Aalto
Integration with Aalto Zoom may come later on
Organization permissions / Invitation settings
Do not set both “Organizational Premissions→Invitations = not required” and “Authentication methods→Email = enabled” at the same time.
You can allow signup by Aalto account or any email. You can allow anyone to signup or make it invitation only. But you can not set “Anyone with Aalto account may signup without invitation, but by email you must be invited” (Zulip limitation). So, we have to work around this, otherwise bots and random people might join in your chat. If the chat needs to include external users, make it invite only.
The exact questions and answers:
Are invitations required for joining in the organization?
If you are only allowing Aalto Login (see ‘Authentication methods’): Can be set to
No,…
(But still, anyone with Aalto account can join)If you are allowing external users/email registration (see ‘Authentication methods’ below): Set to
Yes, only admins can send invitations
. (You can invite people via their Aalto email address for Aalto login)
Organization permissions / Who can access user email addresses
Set this to
Admins only
orNobody
Organization permissions / Who can add bots
Set to
Admins
onlyConsult CS-IT before deploying any bots
Authentication methods
AzureAD
This is Aalto Login and should be enabled
Email
This allows users to register using an email address
We cannot allow random people or bots to register freely
If you enable this, make the chat
invitation only
as described in ‘Invitation settings’ above, for the reason described there.
Users
You can manage users here.
Please be careful with who you assign admins/owners. These roles should be only given to course staff.
The “moderator” role can has extra permissions assigned, such as managing streams and renaming topics. This could be good for course staff/TAs.
Other settings, up to you
You allow messages to be edited longer using Settings → Organization Settings. It is often useful to set this to a longer period.
Practical hints
There is a fine line between a discussion platform and chat, normal chat and topic-based chat, and chaos and order. Here, we give suggestions for you, based on what other teachers have learned.
Topics (basically, like subject for a message thread) is the key feature of Zulip. It is explained more below, but basically keeps things organized. If you don’t want to do that or it doesn’t match your flow, you won’t like the model.
Read the guidelines for students to see the importance of topics and the three ways to use Zulip, and how we typically manage the flood of information in practice.
Give these guidelines to your students (copy and paste from the student page).
Consider why you want a course chat.
Do you want a way to chat and ask questions/discuss in a lower-threshold platform than forum posts? Then this could be good.
Do you want a Q&A forum or support center? Then this may work, but would MyCourses be a better forum?
Do you want a place for students groups to be able to chat among small groups?
Do you mainly want announcements? Then maybe simply use MyCourses?
Create your channels (“streams”) before your students join, and make the important ones default streams (this is done under “Manage organization”), so that everyone will be subscribed (since peolpe will always forget to join streams).
If you do create a new default stream later, use the “clone subscribers” option to clone from another default stream, so that everyone will be subscribed.
Some common streams you might want are
#general
,#announcements
,#questions
. Some people have one stream per homework, exam, theme, and/or task.The main point of streams is to be able to independently filter, mute, and subscribe to notifications. For example, it might be useful to view all questions about one homework in order, or request email notifications from the
#announcements
stream.
You can create user groups (teams) with a certain name. The group can be
@
-mentioned together, or added to a stream.Moderators (and others) can organize other people’s messages by topic. Edit the message to do this, including other people’s. Hotkey is
e
.If you want a Q&A forum, make a stream called
#questions
, or smaller streams for specific topics, and direct students there.You can click the check mark by a topic to mark it as resolved.
Remind students to make a new topic for each new question. This enables good follow-up via “Recent topics”
If students don’t make a new topic (or a topic goes off-track), edit the message and change the topic (change topic for “this message and all later messages”). Then, you keep questions organized, findable, and trackable.
If you don’t want to be answering questions in private message (who does?… it leads to duplicate work), make a clear policy on either reposting the questions publicly yourself (without identification), or directing the students to repost in the public steam themselves.
If you want to limit students to not be able to do anything, you can consider disabling:
Adding streams, adding others to streams (if you want people to only ask and not make their own groups).
Disable private messages (if you really don’t want personal requests for help).
Adding bots, adding custom emojis.
Seeing email addresses. Changing their name.
On the other hand, you might want to “allow message editing” to a much longer period and allow message deleting. For Q&A these are quite useful to have.
You can use the
/poll [TITLE]
command to make lightweight non-anonymous polls. For anonymous polls, someone has used a bot called Errbot, but we don’t currently know much about that.
FAQ
Is there an easier way than subscribing students manually when streams are created? Yes, you should never be doing that manually. See above for cloning membership of a stream from another.
Isn’t it too much work to have to give a topic to every message? Well, you don’t have to when replying. And this is sort of a natural trade-off needed to keep things organized and searchable: you have to think before you send. Most people consider this a worthy trade-off. Note that you can change the topic of messages after the fact, just talk and organize later as needed.
Extra requested features
(see also the student page)
Anonymous polls (a pull request exists with this feature)
Anonymous discussion
More fine-grained permissions for TAs. DONE: moderator role now exists.
Support for bots and other advanced features (more like permission to recommend them, bot support works very well already).
Pinned topics (pull request exists, high-priority issue, #19483).
Long-term invitations (upcoming, high-priority issue, #20337)
Basics
Streams and Topics
In Zulip, discussions are orginized in streams, which are further divided into topics.
Views
The left sidebar let’s you narrow down messages that are displayed, you can select:
All messages, to see everything that is being posted efficiently.
Recent topics, to see which topics have new information.
Different streams and topics, to narrow down to a specific stream or topic.
Recent topics is good to manage a flood of information (see what’s new, click on relevant stuff, ignore all the rest). All messages is better when you are caught up and want to make sure you don’t miss anything. Viewing single topics and streams is good for catching up on something you don’t remember.
Of course, everyone has their own ways and workflows so you should experiment what works best and which views are useful for you.
Message Pane
In the middle of your screen, you have the Message Pane, where the messages are shown.

Message Pane. This is the basic view of messages. You can click on various places to narrow your view to one conversation or reply.
Selecting visible topics
Not all streams are visible in the sidebar by default.
Click the gear icon above the channel list in order to see all available streams and select which ones you want to participate in. It is good to occasionally look at this menu in case new streams are added.

Recent topics, another view of recent activity that shows activity per-topic.
Hints on using Zulip efficiently
How to ask a question
Seems obvious, doesn’t it? You can get the best and fastest answers by helping to keep things organized. These recommendations are mainly for Q&A-forum type chats.
First, search history to see if it has already been asked.
If so, click on the topic name. You will narrow your view to see that entire conversation.
If your question isn’t answered yet, but is a follow up to an existing topic, click on a message in that topic. Then, when you ask, it will go to that same topic as a follow-up, and anyone else can narrow to see the whole history.
Replying to an existing topic.
Unlike other chats, your message will not get lost, and people will both see that it is new and can see the history of that thread.
Your course can say what the threshold for “new topic” is. Maybe they would have one topic per question pre-created or something clever like that.
If you don’t find anything relevant to follow up on, make a new topic.
Making a new topic.
Select the stream you want to post to (whatever fits best).
Click “New topic”.
Enter the topic name down below: a few words, like an email subject. For example,
week 1 question 3
,integrals of complex functions
,exam preparation
.Enter your message and send.
Others (or you…) can split or join topics if they want by going to “edit message”, so there is no risk of doing something wrong. Don’t worry, just ask!
By being organized, you can get both the benefits of quick chat with the organization of not missing anything.
Other hints
You can format your messages using Zulip markdown.
Are you annoyed by having to enter a topic every time you send a message? Remember, when replying you don’t need to. But otherwise, it’s a trade-off: keep it organized or be less searchable. Most of users are clear that keeping organized is worth the searchability. But don’t worry too much: if you happen to get things wrong, others can re-organize topics afterwards.
“Mute a stream” (or topic) is useful when you want to stay subscribed but not be notified of messages by default. You can still find it if you click through the sidebar.
Since Zulip 8.0, you can mute/default/follow (receive notifications) per-topic, for every topic (instead of only muting a topic). This is very powerful. Note that you can change the default in your Notification Settings: when a stream is automatically followed. You might want to adjust the default.
You can also request notifications for everything in a certain stream. This could be good for announcement streams, or your particular projects.
The desktop and mobile apps can support multiple organizations. At least on mobile apps, switching is kind of annoying.
Apps
There are reasonable applications for most desktop and mobile operating systems. These don’t send your data to any other services.
The mobile applications work, but may not be the best for following a large number of courses simultaneously. We can’t currently make improvements in them.
Open issues
We are aware of the following open issues:
It is annoying to have one chat instance per course (but it seems to be standard in chats these days).
There are no mobile push notifications (it would cost too much to use the main Zulip servers, and we haven’t decided to build our own apps yet. info).
Likewise with built-in video calls (via https://meet.jit.si or Zoom).
Various user interface things. But Zulip is open-source, so feel free to contribute to the project…
Data management
In this section, you can find some information and instructions on data management.
Data
Data connects most research together. It’s easy to make in short term, but in the long term it can become so chaotic it loses its value. Can you access your research group’s data from 5 years ago and use it?
Data storage in Aalto
Data in Science-IT departments
Requesting Science-IT department storage space
Existing data groups and responsible contacts:
PHYS:
Aalto: Aalto IT servicedesk
Requesting to be added to a group
Note
CS department: New! Group owners/managers can add members to their groups self-service. Go to https://domesti.cs.aalto.fi from Aalto networks, over VPN, or remote desktop at https://vdi.aalto.fi, and it should be obvious.
Send an email to the responsible contact (see above) and CC the group owner or responsible person, and include this information:
Group name that you request to join
copy and paste this statement, or something similar: “I am aware that all data stored here is managed by the group’s owner and have read the data management policies.”
Ask the group owner to reply with confirmation.
Do you need access to scratch or work? If so, you need a Triton account and you can request it now. If you don’t, you’ll get “input/output error” and be very confused.
Example:
Hi, I (account=omes1) would like to join the group
myprof
. I am aware that all data stored here is managed by the group’s owner and have read the data management policies.$professor_name
, please reply confirming my addition.
Requesting a new group
Send an email to the responsible contact (see above) with the following information. Group owners should be long-term (e.g. professor level) staff.
Requested group name (you can check the name from the lists below)
Owner of data (prof or long-term staff member)
Other responsible people who can authorized adding new members to the group. (they can reply and say “yes” when someone asks to join the group.)
Who is responsible for data should you become unavailable (default: supervisor who is probably head of department).
Initial members
Expiration time (default=max 2 years, extendable. max 5 years archive). We will ping you for management/renewal then.
Which filesystems and what quota. (project, archive, scratch). See the the storage page.
Basic description of purpose of group.
Is there any confidential or personal data (see above for disclaimer).
Any other notes that CS-IT should enforce, for example check NDA before giving access.
Example:
I would like to request a new group
coolproject
. I am the owner, but my postdoc Tiina Tekkari can also approve adding members. (Should I become unavailable, my colleague Anna Algorithmi (also a professor here) can provide advice on what to do with the data)We would like 20GB on the
project
filesystem.This is for our day to day work in algorithms development, we don’t expect anything too confidential.
Science-IT department data principles
Note
Need a place to store your data? This is the place to look. First, we expect you to read and understand this information, at least in general. Then, see Requesting Science-IT department storage space.
This page is about how to handle data - not the raw storage part, which you can find at data storage. Aalto has high-level information on research data management, too.
What is data management?
Data management is much more than just storage. It concerns everything from data collection, to data rights, to end-of-life (archival, opening, etc). This may seem far-removed from research practicalities, but funding agencies are beginning to require advanced planning. Luckily, there are plenty of resources at Aalto (especially in SCI), and it’s just a matter of connecting the dots.
Oh, and data management is also important because without data management, data becomes disorganized, you lose track, and as people come and go, you lose knowledge of what you have. Don’t let this happen to you or your group!
Another good starting point is the Aalto research data management pages. These pages can also help with preparing a data management plan.
Data management is an important part of modern science! We are here to help. These pages both describe the resources available at Aalto (via Science-IT), and provide pointers to issues that may be relevant to your research.
Data storage at Aalto SCI (principles and policies)
Note
This especially applies to CS, NBE, and PHYS (the core Science-IT departments). The same is true for everyone using Triton storage. These policies are a good idea for everyone at Aalto, and are slowly being developed at the university level.
Most data should be stored in a group (project) directory, so that multiple people can access it and there is a plan for after you leave. Ask your supervisor/colleagues what your group’s existing groups are and where the data is stored. Work data should always be stored in a project directory, not personal home directories. See below for how to create or join a group. Home directory data can not be accessed by IT staff, according to law and policy - data there dies when you leave.
All data in group directories is considered accessible to all members (see below).
All data stored should be Aalto or research related. Should there be questions, ask. Finnish law and Aalto policies must be followed (in that order), including by IT staff. Should there be agreements with third-parties regarding data rights, those will also be followed by IT staff, but these must be planned in advance.
All data must have an owner and lifespan. We work with large amount of data from many different people, and data without clear ownership becomes a problem. (“ownership” refers to decision-making responsibility, not IPR ownership). Also, there must be a clear successor for when people leave or become unavailable. By default, this is supervisor.
Personal workstations are considered stateless and, unless there is special agreement, could be reinstalled at any time and are not backed up. This should not concern day to day operations, since by default all data is stored on network filesystems.
We will, in principle, make space for whatever data is needed. However, it is required that it be managed well. If you can answer what the data contains, why it’s stored, and how the space is used, and why it’s needed, it’s probably managed well for these purposes.
Read the full Science-IT data management policy here.
Information on all physical locations how to use them is on the storage page.
Groups
Everywhere on this page, “group” refers to a certain file access group groups (such as a unix group), not an organizational (research) group. They will often be the same, but there can be many more access groups made for more fine-grained data access.
Data is stored in group directories. A group may represent a real research group, a specific project, or specific access-controlled data. These are easy to make, and they should be extensively used to keep data organized. If you need either finer-grained or more wide data access, request that more groups are made.
Please note, that by design all project data is accessible to every member in the group. This means that, when needed, IT can fix all permissions so that all group members can read all data. For limiting the access more fine-grained than these project groups, please have a separate group created. Data in a group is considered “owned and managed” by the group owner on file. The owner may grant access to others and change permissions as needed. Unless otherwise agreed, any group member may also request permissions to be corrected so that everyone in the group has access.
Access control is provided by unix groups (managed in the Aalto active directory). There can be one group per group leader, project, or data that needs isolation. You should use many groups, they make overall management easier. A group can be a sub-group of another.
Each group can get its own quota and fileystem directories (project, archive, scratch, etc). Quota is per-filesystem. Tell us requested quota when you set up a project.
A typical setup would be: one unix group for a research group, with more groups for specific project when that is helpful. If there are fixed multi-year projects, they can also get a group.
Groups are managed by IT staff. To request a group, mail us with the necessary information (see bottom of page).
Each group has an owner, quota on filesystems, and some other metadata (see below).
Group membership is per-account, not tied to employment contracts or HR group membership. If you want someone to lose access to a group you manage, they have to be explicitly removed by the same method they were added (asking someone or self-service, see bottom of page).
To have a group created and storage space allocated, see below.
To get added to a group, see instructions below.
To see your groups: use the
groups
command orgroups $username
To see all members of a group:
getent group $groupname
Common data management considerations
Organizing data
This may seem kind of obvious, but you want to keep data organized. Data is always growing in volume and variety, so if you don’t organize it as it is being made, you have no chance of doing it later. Organize by:
Project
To be backed up vs can be recreated
Original vs processed.
Confidential or not confidential
To be archived long-term vs to be deleted
Of course, make different directories to sort things. But also the group system described above is one of the pillars of good data organization: sort things by group and storage location based on how it needs to be handled.
Backups
Backups are extremely important, not just for hardware failure, but consider user error (delete the wrong file), device lost or stolen, etc. Not all locations are backed up. It is your responsibility to make sure that data gets stored in a place with sufficient backups. Note that personal workstations and mobile devices (laptops) are not backed up.
Openness
Aalto strongly encourages to share the data openly or under controlled access with a goal of 50% data shared by 2020 (see The Aalto RDM pages). In short, Aalto says that you “must” make strategic decisions about openness for the best benefits (which practically probably means you can do what you would like). Regardless, being open is usually a good idea when you can: it builds impact for your work and benefits society more.
Zenodo (https://zenodo.org/) is an excellent platform for sharing data, getting your data cited (it provides a DOI), and control what you share with different policies (https://about.zenodo.org/policies/). For larger data, there are other resources, such as IDA/AVAA provided by CSC (see below).
There are lists of data repositories: r3data, and Nature Scientific Data’s list.
Datasets can and should also be listed on ACRIS, just like papers - this allows you to get credit for them in the university’s academic reporting.
Data management plans
Many funders now require data management plans when submitting grants. (Aside from this, it’s useful to do a practical consideration of how you’ll deal with data)
Please see:
Summary of data locations
Below is a summary of the core Science-IT data storage locations.
Solution |
Purpose |
Where available? |
Backup? |
Group management? |
---|---|---|---|---|
project |
Research time storage for data that requires backup. Good for e.g. code, articles, other important data. Generally for a small amount of data per project. |
Workstations, triton login node |
Weekly backup to tape (to recover from major failure) + snapshots (recover accidentally deleted files). Snapshots go back
|
yes |
Archive |
Data which a longer life that project. Practically the same, but better to sort things out early. Also longer snapshot and guaranteed to get backed up to tape. |
Workstations, Triton login node. /m/$dept/project/$group. |
Same as above |
yes |
Scratch (group based)/work (per-user) |
Large research data that doesn’t need backup. Temporary working storage. Very fast access on Triton. |
/m/$dept/$scratch/$groupname, /m/$dept/work/$username. |
no |
scratch: yes, work: no |
See data storage for full info.
Requesting data storage space
Filesystem details
This page gives details of available data storage spaces, with an emphasis on scientific computing access on Linux.
Other operating systems: Windows and OSX workstations do not
currently have any of these paths mounted. In the future, project and
archive may be automatically mounted. You can always remote mount via
sshfs or SMB. See the remote access page for
Linux, Mac, and Windows instructions for home,project, and archive. In
OSX, there is a shortcut in the launcher for mounting home. In Windows
workstations, this is Z drive. On your own computers, you may need to
use AALTO\username
as your username for any of the SMB mounts.
Laptops: Laptops have their own filesystems, including home directories. These are not backed up automatically. Other directories can be mounted as described on the remote access page.
Summary table
This table lists all available options in Science-IT departments, including those not managed by departments. In general, project is for most research data that requires good backups. For big data, use scratch. Request separate projects when needed to keep things organized.INLINE
Filesystem |
Path (Linux) |
Triton? |
Quota |
Backups? |
Notes |
---|---|---|---|---|---|
home |
/u/…/$user name/unix |
no |
100 GiB |
yes, $HOME/../.sn apshot/ |
Used for personal and non-research files |
project |
/m/$dept/proj ect/$project/ |
some |
per-project, up to 100s of GiB |
Yes, hourly/daily /weekly. (.snapshot) |
|
archive |
/m/$dept/arch ive/ $project/ |
some |
per-project, up to 100s of GiB |
Yes, hourly/daily weekly. + off-site tape backups. (.snapshot) |
|
scratch |
/m/$dept/scr atch/$pro hect/ |
yes |
per-project, 2 PiB available |
RAID6, but no backups. |
Don’t even think about leaving irreplaceable files here! Need Triton account. |
work (Triton) |
/m/$dept/wor k/$username/ |
yes |
200 GB default |
RAID6, but no backups. |
Same as scratch. Need Triton account. |
local |
/l/$username / |
yes |
usually a few 100s GiB available |
No, and destroyed if computer reinstalled. |
Directory needs to be created and permissions should be made reasonable (quite likely ‘chmod 700 /l/$USER’, by default has read access for everyone!) Space usage: `du -sh /l/`. Not shared among computers. |
tmpfs |
/run/user/$u id/ |
yes |
local memory |
No |
Not shared. |
webhome |
$HOME/public _html/ (/m/webhome/ …) |
no |
5 GiB |
||
custom solutions |
Contact us for special needs, like sensitive data, etc. |
General notes
The table below details the types of filesystems available.
The path /m/$dept/ is designed to be a standard location for mounts. In particular, this is shared with Triton.
The server
magi
ismagi.cs.aalto.fi
and is for the CS department. Home directory is mounted here without kerberos protection but directories under /m/ need active kerberos ticket (that can be acquired with ‘kinit’ command) .taltta
istaltta.aalto.fi
and is for all Aalto staff. Both use normal Aalto credentials.Common problem: The Triton
scratch
/work
directories are automounted. If you don’t see it, enter the full name then tab complete and it will appear. It will appear after you try accessing with the full name.Common problem: These filesystems are protected with Kerberos, which means that you must be authenticated with Kerberos tickets to access them. This normally happens automatically, but they expire after some time. If you are using systems remotely (the shell servers) or have stuff running in the background, this may become a problem. To solve, run
kinit
and it will refresh your tickets..
Details
home: your home directory
Shared with the Aalto environment, for example regular Aalto workstations, Aalto shell servers, etc.
Should not be used for research work, personal files only. Files are lost once you leave the university.
Instead, use project for research files, so they are accessible to others after you leave.
Quota 100 GiB.
Backups recoverable by
$HOME/../.snapshot/
(on linux workstations at least).SMB mounting:
smb://home.org.aalto.fi/
project: main place for shared, backed-up project files
/m/$dept/project/$project/
Research time storage for data that requires backup. Good for e.g. code, articles, other important data. Generally for small amount (10s-100s GiB) of data per project.
This is the normal place for day to day working files which need backing up.
Multi user, per-group.
Quotas: from 10s to 100s of GiB
Quotas are not designed to hold extremely large research data (TiBs). Ideal case would be 10s of GiB, and then bulk intermediate files on scratch.
Weekly backup to tape (to recover from major failure) + snapshots (recover accidentally deleted files). Snapshots go back:
hourly last 26 working hours (8-20)
daily last 14 days
weekly last 10 weeks
Can be recovered using
.snapshot/
within project directories
Accessible on
magi
/taltta
at the same path.SMB mounting:
smb://tw-cs.org.aalto.fi/project/$group/
archive:
/m/$dept/archive/$project/
For data that should be kept accessible for 1-5 years after the project has ended. Alternatively a good place to store a copy of a large original data (backup).
This is practically the same as project, but retains snapshots for longer so that data is ensured to be written to tape backups.
This is a disk system, so does have reasonable performance. (Actually, same system as project, but separation makes for easier management).
Quotas: 10s to 1000s of GiB
Backups: same as project.
Accessible on
magi
/taltta
at the same path.SMB mounting:
smb://tw-cs.org.aalto.fi/archive/$group/
scratch: large file storage and work, not backed up (Triton).
/m/$dept/scratch/$group/
Research time storage for data that does not require backup. Good for temporary files and large data sets where the backup of original copy is somewhere else (e.g. archive).
This is for massive, high performance file storage. Large reads are extremely fast (1+ GB/s).
This is a lustre file system as part of triton (which is in Keilaniemi).
Quotas: 10s to 100s of TiB. The university has 2 PB available total.
In order to use this, you must have a triton account. If you don’t, you get “input/output error” which is extremely confusing.
On workstations, this is mounted via NFS (and accessing it transfers data from Keilaniemi on each access), so it is not fast on workstations, just large file storage. For high performance operations, work on triton and use the workstation mount for convenience when visualizing.
This is RAID6, so is pretty well protected against single disk failures, but not backed up at all. It is possible that all data could be lost. Don’t even think about leaving irreplaceable files here. CSC actually had a problem in 2016 that resulted in data loss. It is extremely rare (decades) thing, but it can happen. (still, it’s better than your laptop or a drive on your desk. Human error is the greatest risk here).
Accessible on
magi
/taltta
at the same path.SMB mounting:
smb://data.triton.aalto.fi/scratch/$dept/$dir/
. (Username may need to beAALTO\yourusername
.)
Triton work: personal large file storage and work (Triton)
/m/$dept/work/$username/
This is the equivalent of scratch, but per-person. Data is lost once you leave.
Accessible on
magi
/taltta
at the same path.SMB mounting:
smb://data.triton.aalto.fi/work/$username
. (Username may need to beAALTO\yourusername
.)Deleted six months after your account expires.
Not to be confused with Aalto work (see below).
local: local disks for high performance
You can use local disks for day to day work. These are not redundant or backed up at all. Also, if your computer is reinstalled, all data is lost.
Performance is much higher than any of the other network filesystems, especially for small reads. Scratch+Triton is still faster for large reads.
If you use this, make sure you set UNIX permissions to restrict the data properly. Ask if you are not sure.
If you store sensitive data here, you are responsible for physical security of your machine (as in no one taking a hard drive). Unix permissions should protect most other cases.
When you are done with the computer, you are also responsible for secure management/wiping/cleanup of this data.
See the note about disk wiping under Aalto Linux (under “when you are done with your computer”). IT should do this, but if it’s important you must mention it, too.
tmpfs: in-memory filesystem
This is a filesystem that stores all data in memory. It is extremely high performance, but extremely temporary (lost on each reboot). Also shares RAM with your processes, so don’t use too much and clean up when done.
TODO: are these available everywhere?
webhome: web space for users.aalto.fi
This is the space for users.aalto.fi space can be accessed from the
public_html
link in your home directory.This is not a real research filesystem, but convenient to note here.
Quota (2020) is 5 GiB. (
/m/webhome/webhome/
)
triton home: triton’s home directories
Not part of departments, but documented here for convenience
The home directory on Triton.
Backed up daily.
Not available on workstations.
Quota: 1 GB
Deleted six months after your account expires.
Aalto work: Aalto’s general storage space
/work/$deptcode
on Aalto workstations and servers.Not often used within Science-IT departments: we use project and archive above, which are managed by us and practically equivalent. You could request space from here, but expect less personalized service.
Aalto home directories are actually here now.
You may request storage space from here, email the Aalto servicedesk and request space on work. The procedures are not very well established.
Data is snapshotted and backed up offsite for disaster recovery.
Search https://it.aalto.fi for “work.org.aalto.fi” for the latest instructions.
SMB mounting via
smb://work.org.aalto.fi
Aalto teamwork: Aalto’s general storage space
Not used directly within Science-IT departments: we have our own direct interfaces to this, and
project
andarchive
directories are atually here.For information on getting teamwork space (outside of Science-IT departments), contact servicedesk.
Teamwork is unique in that it is arbitrarily extensible, and you may buy the space from the vendor directly. Thus, you can use external grant money to buy storage space here.
SMB mounting via
smb://teamwork.org.aalto.fi
Confidential data handling
Confidential data is data which has some legal reason to be protected.
Confidential or sensitive data
Note
The following description is written for the CS department, but applies almost equally to NBE and PHYS. This is being expanded and generalized to other department as well. Regardless of your department, these are good steps to follow for any confidential data at Aalto.
Note
This meets the requirements for “Confidential” data, which covers most use cases. If you have extreme requirements, you will need something more (but be careful about making custom solutions).
Aalto has some guidelines for classification of confidential information, but they tend to deal with documents as opposed to practical guidelines for research data. If you have data which needs special attention, you should put it in a separate group and tell us when creating the group.
The following paragraph is a “summary for proposals”, which can be used when the CS data security needs to be documented. This is for the CS department, but similar thing can be created for other departments. A longer description is also available.
Aalto CS provides secure data storage for confidential data. This data is stored centrally in protected datacenters and is managed by dedicated staff. All access is through individual Aalto accounts, and all data is stored in group-specific directories with per-person access control. Access rights via groups is managed by IT, but data access is only provided upon request of the data owner. All data is made available only through secure, encrypted, and password-protected systems: it is impossible for any person to get data access without a currently active user account, password, and group access rights. Backups are made and also kept confidential. All data is securely deleted at the end of life. CS-IT provides training and consulting for confidential data management.
If you have confidential data at CS, follow these steps. CS-IT takes responsibility that data managed this way is secure, and it is your responsibility to follow CS-IT’s rules. Otherwise you are on your own:
Request a new data folder in the project from CS-IT. Notify them that it will hold confidential data and any special considerations or requirements. Consider how fine-grained you would like the group: you can use an existing group, but consider how many people will have access.
Store data only in this directory on the network drive. It can be accessed from CS computers, see data storage.
To access data from laptops (Aalto or your own), use network drive mounting, not copying. Also consider if temporary files: don’t store intermediate work or let your programs save temporary files to your own computer.
Don’t transfer the data to external media (USB drives, external hard drives, etc) or your own laptops or computers. Access over the network.
All data access should go through Aalto accounts. Don’t send data to others and or create other access methods. Aalto accounts provide central auditing and access control.
Realize that you are responsible for the day to day management of data and using best practices. You are also responsible for ensuring that people who have access to the data follow this policy.
In principle, one can store data on laptops or external devices with full disk encryption. However, in this case we does not take responsibility unless you ask us first.you must ask us about this. In general it’s best to try to adapt to the network drive workflow. (Laptop full disk encryption is a good idea anyway).
We can assist in creating more secure data systems, as can Aalto IT security. It’s probably more efficient to contact us first.
Personal data (research data about others, not about you)
“Personal data” is any data concerning an identifiable person. Personal data is very highly regulated (mainly by the Personal Data Act, soon by the General Data Protection Regulation). Aalto has a document that describes what is needed to process personal data for research, which is basically a research-oriented summary of the Personal Data Act. Depending on the type of project, approval from the Research Ethics Committee may be needed (either for publication, or for human interaction. The second one would not usually cover pure data analysis of existing data). Personal data handling procedures are currently not very well defined at Aalto, so you will need to use your judgment.
However, most research does not need data to be personally identifiable, and thus research is made much simpler. Thus, you want to try to always make sure that data is not identifiable, even to yourself using any technique (anonymization). The legal requirement is “reasonable likelihood of identification”, which can include technical and confidentiality measures, but in the end is still rather subjective. Always anonymize before data arrives at Aalto, if possible. Let us know when you have personal data, so we can make a note of it in the data project.
However, should you need to use personal data, the process is not excessively involved beyond what you might expect (informed consent, ethics, but then a notification of personal data file). Contact us for initial help in navigating the issues and RIS for full advice.
Boilerplate descriptions of security
For grants, etc. you can find a description under Boilerplate text for grant proposals
Long-term archival
Long-term archival is important to make sure that you have ability to access your group’s own data in the long term. Aalto resources are not currently intended for long-term archival. There are other resources available for this, such as
the EU-funded Zenodo for open published data (embargoed data and closed data is also somewhat supported).
Finland’s IDA (for large data, closed or open). There are Aalto-specific instructions for IDA here.
There is supposed to be an alternate Finnish digital preservation service coming in 2017, and it’s unclear what the intention of IDA is in light of that.
Leaving Aalto
Unfortunately, everyone leaves Aalto sometime. Have you considered what will happen to your data? Do you want to be remembered? This section currently is written from the perspective of a researcher, not a professor-level staff member, but if you are a group leader you need to make sure your data will stay available! Science-IT (and most of these resources) are focused on research needs, not archiving a person’s personal research data (if we archive it for a person who has left, it’s not accessible anyway! Our philosophy is that it should be part of a group as described above.). In general, we can archive data as part of a professor’s group data (managed in the group directories the normal ways), but not for individuals.
Remember that your home directories get removed when your account expires (we think in only two weeks!).
Data in the group directories it won’t be automatically deleted. But you should clean up all your junk and leave only what is needed for future people. Remember, if you don’t take care of it, it becomes extremely hard for anyone else to. The owner of the group (professor) will be responsible for deciding what to do with the data, so make sure to discuss with them and easy for them to do the right thing!
Make sure that the data is documented well. If it’s undocemented, then it’s unusable anyway.
Can your data be released openly? If you can release something as open data on a reputable archive site like Zenodo, you can ensure that you will always have access to it. (The best way to back up is to let the whole internet do it for you.)
For lightweight archival (~5 years past last use, not too big), the archive filesystem is suitable. The data must be in a group directory (probably your professor’s). Make sure that you discuss the plans with them, since they will have to manage it.
IDA (see above) could be used for archival of any data, but you will have to maintain a CSC account (TODO: can this work, and how?). Also, these projects have to be owned by a senior-level staff person, so you have to transfer it to a group anyway.
Finland aims to have a long-term archival service by 2017 (PAS), but this is probably not intended for own data, only well-curated data. Anyway, if you need something that long and it isn’t confidential, consider opening it.
Data organization
How should data be stored? On the simplest level, this asks “on what physical disks”, but this page is concerned about something more high-level: how you organize data on those disks.
Data organization is very important, because if you don’t do it early, you end up with a epic mess which you will never have time to clean up. If you organize data well, then everything after becomes much easier: you can archive what you need. Others can find what they need. You can open what you need easily.
Everything here applies equally if you are working alone or if you are part of a team.
Organize your projects into directories
Names
As simple as it seems, choosing a good name for each distinct workspace is an important first step. This serves as an identifier to you and others, and by having a name you are able to refer to, find, and organize your data now and in the future.
A name should be unique among all of your work over all your career,
and also unique among all of your colleagues, too (and any major
public projects, too). Don’t reuse the same names for related things.
For example, let’s say I have a project called xproject
. If I
track the code separately from the data, I’d have a different
directory called xproject-data
and the main projects refers to
the data directory, instead of coping the data.
How many named workspaces should you have for each project? It depends on how large they are and how diverse the types of data are. If the data is small and not very demanding, it doesn’t matter much. If you have large data vs small other files, it may be good to separate out the data. If you have some data/code/files which will be reused in different projects, it makes sense to split them. If you have confidential data that can’t be shared, it’s good to separate them from the rest of the data.
Names should be usable and directory names and identifiers. Try to
stick to letters, numbers, -
, and _
- no spaces, punctuation,
or symbols. Then, the name is usable on repositories and other
services, too.
Good names include MobilityAnalysis
, transit
, transit-hsl
,
and lgm-paper
. Bad names are too general given their purpose or
what else you might do.
Each directory’s contents moves together as a unit, as much as possible.
Organizing these directories
You should have a flat organization in as few places as
possible. For example, on your laptop you may have ~/project
for
things for the stuff you mainly work on and ~/git
for other minor
version controlled things. On your workstations or servers, you may
also have /scratch/work/$username
which is your personal stuff
that is not backed up, /m/cs/project/$groupname/$username/
which
is backed up, /local
which is temporary stuff on your own
computer, and so on. The server-based locations can be easily shared
among multiple people.
Your structure should be as flat as possible, without many layers
in each directory. Thus, to find a given project, you only need to
look inside each of the main locations above, not inside every other
project. This allows you to get the gist of your data for future
archival or clean-up. When two directories need to refer to each
other, you have them directly refer to each other where they are, for
example use ../xproject-data
from inside the xproject
directory. (You can have subdirectories inside the projects).
Different types of projects go in different places. For example,
xproject
can be on the backed up location because it’s your
daily work, while xproject-data
is on some non-backed up place
because you can always recover the data.
Synchronizing
If you work on different systems, each directory of the same name should have roughly the same contents - as if you could synchronize it with version control.
For small stuff, you might synchronize with version control. You may use some other program, like Dropbox or the like. Or in the case of data which has a master copy somewhere else, you just download what you need.
Organize files within directories
Traditional organization
This is the traditional organization within a single person’s project. The key concept is separation of code, original data, scratch data, and final outputs. Each is handled properly.
PROJECT/code/
- backed up and tracked in a version control system.PROJECT/original/
- original and irreplaceable data. Backed up at the same time it is placed here.PROJECT/scratch/
- bulk data, can be regenerated from code+originalPROJECT/doc/
- final outputs, which should be kept for a very long term.PROJECT/doc/paper1/
- different papers/reports, if not stored in a different project directory.PROJECT/doc/paper2/
PROJECT/doc/opendata/
When the project is over, code/
and doc/
can be backed up permanently
(original/
is already backed up) and the scratch directory can be kept
for a reasonable time before it is removed (or put into cold storage).
The most important thing is that code is kept separate from the data. This means no copying files over and over to minor variations. Could should be adjustable for different purposes (and you can always get the old versions from version control). Code is run from the code directory, no need to copy to each folder individually.
Multi-user
The system above can be trivially adapted to suit a project with multiple users:
PROJECT/USER1/....
- each user directory has their owncode/
,scratch/
, anddoc/
directories. Code is synced via the version control system. People use the original data straight from the shared folder in the project.PROJECT/USER2/....
PROJECT/original/
- this is the original data.PROJECT/scratch/
- shared intermediate files, if they are stable enough to be shared.
For convenience, each user can create a symbolic link to the original/ data directory from their own directory.
Master project
In this, you have one long-term master directory for a whole research group, and members project that has many different users and research themes with in. As time goes on, once users leave, their directories can be cleaned up and removed. The same can happen for the themes.
PROJECT/USER1/SUBPROJECT1/...
PROJECT/USER1/SUBPROJECT2/...
PROJECT/USER2/SUBPROJECT1/...
PROJECT/original/
PROJECT/THEME/USER1/...
PROJECT/THEME/USER2/...
PROJECT/archive/
Common variants
Simulations with different parameters: all parameters are stored in the code directory, within version control. The code knows what parameters to use when making a new run. This makes it easy to see the entire history of your simulations.
Downloading data: this can be put into either original or scratch, depending on how much you trust the original source to stay available.
Multiple sub-projects: this can be
Multiple types of code: separate long-term code from scratch research code. You can separate parameters from code. And so on…
Projects
In Aalto, data is organized into project groups. Each project has members who can access the data, and different shared storage spaces (project, archive, scratch (see below)). You can apply for these whenever you need.
What should a project contain? How much should go into the same project?
One project that lasts forever per research group: This is traditional. A professor will get a project allocated, and then people put data in here. There may be subdirectories for each researcher or topic, and some shared folders for common data. The problem here is that the size will grow without bound. Who will ever clean up all the old stuff? These have a way of growing forever so that the data becomes no longer manageable, but they are convenient because it keeps the organization flat.
If data size is small and growing slower than storage, this works for long-term.
It can also work if particular temporary files are managed well and eventually removed.
One project for each distinct theme: A research group may become interested in some topic (for example, a distinct funded project), and they get storage space just for this. The project goes on and is eventually closed.
You can be more fine-grained in access, if data is confidential
You can ensure that the data stays together
You can ensure that data end-of-life happens properly. This is especially useful for showing you are managing data properly as part of grant applications.
You can have a master group as a member of the specific project. This allows a flat organization, where all of your members can access all data in different projects.
Science-IT data policy
Note
This was originally developed at CS, but applies to all departments managed by the Science-IT team.
In Aalto, large amounts of data with variety of requirements are being processed daily. This describes the responsibilities of IT support and users with respect to data management.
Everyone should know the summary items below. The full policy is for reference in case of doubts (items in bold are things which are not completely obvious).
This policy is designed to avoid the most common problems by advance planning for the majority case. Science-IT is eager to provide a higher level of service for those who need it, but users must discuss with staff. This policy is jointly implemented by department IT and Science-IT.
Summary for users
Do not store research data in home directories, this is not accessible should something happen to you or when you leave. They will be automatically deleted.
Project directories are accessible to ALL members, files not intended for access by ALL members should be stored in a separate project.
Workstations and mobile devices are NOT backed up. Directories with backups are noted. It is your responsible to make sure that you store in backed up places. Don’t consider only disk failure, but also user error, loss of device, etc.
Data stored in project directories is managed by the (professor, supervisor) who owns the directory, and they can make decisions regarding access now and in the future. Any special considerations should be discussed with them.
Data is not archived or saved for individual users. Data which must be saved should be in a shared project directory with an owner who is still at Aalto. Triton’s individual users data is permantently deleted after 6 months from the expiration date of the user account (Aalto home directories may be deleted even sooner).
There is no default named security level - of course we keep all data secure, but should you be dealing with legally confidential files, you must ask us.
Summary for data directory owners (professors or long-term staff)
Data in the shared directories controlled by you and you make decisions on it.
All data within a project is accessible by all members of that project. Make more projects if more granularity is needed.
Data must have an expiration time, and this is extended as needed. Improperly managed data is not stored indefinitely. If data is long-term archived, it must still have an administrative owner at Aalto who can make decisions about it.
There must be a succession plan for the data, should the data owner leave or become unavailable to answer questions. By default this is the supervisor or department head. They will make decisions about access, management, and end-of-life.
We will try to handle whatever data you may need us to. The only prerequisite is that it is managed well. We can’t really define “managed well”, but at least it means you know what it contains and where the space is going.
Detailed policy
This is the detailed policy. The important summary for users and owners is above, but the full details are written below for avoidance of doubts.
Scope
This policy concerns all data stored in the main provided locations or managed by Science-IT staff (including its core departments).
Responsibilities
In data processing and rules we follow Finnish legislation and Aalto university policies in this order.
If there are agreements with a third party organization for data access those rules are honored next. Regarding this type of data we must be consulted first prior to the storing the data.
Users are expected to follow all Aalto and CS policies, as well as good security practices.
IT is expected to provided a good service, data security, and instruction on best practices.
Storage
All data must have owner and given lifespan. Data cannot be stored indefinitely, but of course lifespan is routinely extended when needed. There are other long-term archival services.
Work related data should always be stored outside users HOME directory. HOME is meant only for private and non-work related files. (IT staff is not allowed to retrieve lost research files from a user’s home directory)
Other centrally available folders (i.e. Project, Archive, Scratch) than HOME are meant for work related information only.
Desktop computers are considered as stateless. They can be re-installed at any point by IT if necessary. Data stored on local workstations is always considered as temporary data and is not backed up. IT support will still try to inform users of changes.
Backed-up data locations are listed. It is the user’s responsibility to ensure that data is stored in backed-up locations as needed. Mobile devices (laptops) and personal workstations are not backed up.
Ownership, and access rights, and end-of-life
Access rights in this policy refer only to file system rights. Other rights (e.g. IPR) to the stored information are not part of this policy.
There must be a clear owner and chain of responsibility (successor) for all data (who owns it and can make decisions and who to ask when they leave or become unavailable).
For group directories (e.g. project, archive, scratch), file system permissions (possibility to read, write, copy, modify and delete) of these files belongs to group. There is not more granular access, for example single files with more restrictive permissions. Permissions will be fixed by IT on request from group members.
The group owner-on-file can make any decisions related to data access, management, or end-of-life.
Should a data owner of a group directory become unavailable or unable to answer questions about access, management, or end-of-life, the successor they named may make decisions regarding the data access, including end-of-life. This defaults to their supervisor (e.g. head of department), but should be discussed on data opening.
Triton data stored on folders that are not group directories (e.g. the content of /scratch/work/USERNAME or /home/USERNAME) will be permanently deleted after 6 months from the user’s account expiration. Please remember to back up your data if you know that your account is expiring soon. (Note that Aalto home directory data may be removed even earlier)
Should researchers need a more complex access scheme, this must be discussed with IT support.
Security/Confidentiality
Unless there is a notification, there is no particular guaranteed service level regarding confidential data. However, all services are expected to be as secure as possible and are designed to support confidential data.
Should a specific security level be needed, that must be agreed separately.
Data stored to the provided storage location is not encrypted at rest.
Confidentiality is enforced by file system permissions will be set and access changes will be always confirmed from data owner.
All storage medium (hard drives, etc), should be securely wiped to the extend technically feasible at end of life. This is handled by IT, but if it is required it must be handled by the end users.
All remote data access should use strong encryption.
Users must notify IT support or their supervisor about any security issues or misuse of data.
Security of laptops, mobile devices and personal devices is not currently guaranteed by IT support. Confidential data should use centralized IT-provided services only.
Users and data owners must take primary responsibility for data security, since technical security is only one part of the process.
Communication
Details about centrally provided folders and best practices are available in online documentation.
Changes to policy will be coordinated by department management. All changes will at least be announced to data owners, but individual approvals are not needed unless a service level drops.
Data on Triton
Triton is a computer cluster that provides large and fast data storage connected to significant computing power, but it is not backed up.
Tutorial: Data storage
Tutorial: Remote access to data
Triton quick reference: Storage and Remote data access
Overview with checklist: Storage
Data management
This section covers administrative and organizational matters about data.
Aalto Research Data Management pages, and here we focus on the practical side of things.
Other
Summary table
O = good, x = bad
Large |
Fast |
Confidential |
Frequent backups |
Long-term archival |
|
---|---|---|---|---|---|
Code |
OO |
OO |
|||
Original data |
O |
O |
OO? |
OO |
OO |
Intermediate files |
OO |
OO |
OO? |
||
Final results/open data |
OO |
Large |
Fast |
Confidential |
Backups |
Long-term archival |
Shareable |
||
---|---|---|---|---|---|---|---|
Triton |
scratch |
OO |
OO |
O |
x |
x |
O |
work |
OO |
OO |
O |
x |
x |
||
Triton home |
x |
O |
OO |
||||
Local disks |
O |
OO |
O |
||||
ramfs |
OOO |
OO |
|||||
Depts |
/m/…/project |
O |
O |
OO |
OO |
O |
|
/m/…/archive |
O |
O |
OO |
OO |
O |
O |
|
Aalto |
Aalto home |
OO |
OO |
||||
Aalto work |
O |
O |
OO |
OO |
O |
||
Aalto teamwork |
O |
O |
OO |
OO |
O |
||
Aalto laptops |
x |
x |
X |
||||
Aalto webspace |
OO |
||||||
version.aalto.fi |
OO |
OO |
O |
OO |
|||
ACRIS |
O |
O |
|||||
Eduuni |
|||||||
Aalto Wiki |
|||||||
Finland |
Funet filesender |
O |
OO |
||||
CSC cPouta |
O |
O |
O |
||||
CSC Ida |
OOO |
x |
OO |
O |
O |
||
FSD |
OO |
O |
OO |
O |
|||
Public |
github |
x |
OO |
||||
Zenodo |
OO |
OO |
|||||
Google drive |
x |
O |
|||||
OneDrive |
|||||||
Own computers |
x |
x |
x |
||||
Emails |
x |
x |
x |
||||
EUDAT B2SHARE |
O |
O |
O |
Cheatsheets: Data, A4 Data management plan.
Triton
Triton is the Aalto high performance computing cluster. It is your go-to resources for anything that exceeds your desktop computer’s capacity. To get started, you could check out the tutorials (going through all the principles) or quickstart guide (if you pretty much know the basics).
Triton cluster
Triton is the Aalto high-performance computing cluster. It serves all researchers of Aalto, but is currently coordinated from within the School of Science. Access is free for researchers (see Triton accounts, students not doing research should check out our intro for students). It is similar to the CSC clusters, though CSC clusters are larger and Triton is easier to use because it is more integrated into the Aalto environment.
Overview
Triton accounts
You need to request Triton access separately, however, the account information (username, password, shell, etc) is shared with the Aalto account so there is not actually a separate account. Triton access is available to any researcher at Aalto for free. Resources are funded by departments, and distributed by a fairshare algorithm: members of departments and schools which provide direct funding have a greater share.
Please use the account request form (“Triton: New user account”) to request the account. (For future help, you should probably use our issue tracker: see the Getting Triton help page.)
A few prerequisites:
You must check the intended use cases and agree with the Triton usage policies, including the data and privacy policies.
You must have valid Aalto account.
Also tell us your department/school in your account creation request.
You should have enough background to use Triton well, including Linux skills. Read hands-on scientific computing, and you should know A (“Basics”), C (“Linux”), and D (“HPC”) well. Also see the Triton tutorials.
Accounts are for (see details):
Research, for which a research supervisor can take responsibility. (please tell us who your supervisor is in your account request).
Anyone affiliated with a research PI in any way (for affiliated research).
… for use in research work (scholarly publications, datasets, research software, theses, etc.) Triton should be acknowledged and linked in ACRIS.
… for use in research theses (PhD, Masters, Bachelors). However, literature review/non-research theses shouldn’t use Triton (though these shouldn’t have a need for Triton). These should also be acknowledged in ACRIS.
A thesis at a company doesn’t necessarily disqualify Triton use, but Triton should be used for the relevant research that will be in the thesis, not other company work.
Other staff use (minor)
For example, testing computational methods, supporting other research, or simple usage of computational tools. However, Triton isn’t designed as an operative environment and has no service guarantees.
NOT for: study projects
There are other resources for this, see Welcome, students!
If you are a student doing a course project in which you join a research group and contribute to research, you fit in the research category and may use Triton. You should be clear about this in your request, put your research supervisor (not course instructor) as supervisor. The supervisor should respond to an email confirming the account.
Students coming to one of our Scientific Computing in Practice courses which uses Triton.. You will be specifically told if this is the case.
Visitors, students (non-employed), and teaching assistant job titles will have a check of their research supervisor before the account is created (you supervisor will have to reply “yes”, you can remind them to read their email and answer). Others will have their research supervisor cc:ed in the account creation message.
You know that you have Triton access if you are in the
triton-users
group at Aalto: The groups
command shows this on
Aalto linux machines.
Your department/unit
When you get an account, you get added to a unit’s group, which is
“billed” for your usage. If you change Aalto units, this may need
updated. Check sshare -U
or sshare
and if it’s wrong, let us
know (the units are first on the line). (These are currently by
department, so changes are not that frequent)
Password change and other issues
Since your Triton account is a regular Aalto account, for any password change, shell change etc use Aalto services. You can always do these on the server kosh.aalto.fi (at least).
If you are in doubts, in case of any account related issue your primary point of contact is your local support team member via the support email address. Do not post such issues on the tracker.
Account deactivation / remove from mailing list
Your account lasts as long as your Aalto account does, and the triton-users mailing list is directly tied to Triton account. You will also be unsubscribed from the mailing list (they go together, you can’t just be removed from the mailing list).
If you want to deactivate your account, send an email to the scicomp email address (scicomp -at- aalto.fi). You can save time by saying something like the following in your message (otherwise we will reply to confirm, if you have any special requests or need help, ask us): “I realize that I will lose access to Triton, I have made plans for any important data data and I realize that any home and work directory data will eventually be deleted”.
Before you leave, please clean up your home/work/scratch directories data. Consider who should have your data after you are done: does your group still need access to it?. You won’t have access to the files after your account is deactivated. Note that scratch/work directory data are unrecoverable after deleting, which will happen eventually. If data is stored in a group directory (/scratch/$dept/$groupname), it won’t be deleted and will stay managed by the group owner.
Terms of use/privacy policy
See the Usage policies and legal page.
Triton quick reference
In this page, you have all important reference information
Quick reference guide for the Triton cluster at Aalto University, but also useful for many other Slurm clusters. See also this printable Triton cheatsheet, as well as other cheatsheets.
Connecting
See also: Connecting to Triton.
Method |
Description |
From where? |
---|---|---|
ssh from Aalto networks |
Standard way of connecting via command line. Hostname is
>Linux/Mac/Win from command line: >Windows: same, see Connecting via ssh for details options. |
VPN and Aalto networks (which is VPN, most wired,
internal servers, |
ssh (from rest of Internet) |
Use Aalto VPN and row above. If needed: same as above, but must set up SSH key and then |
Whole Internet, if you first set up SSH key AND also use passwords (since 2023) |
VDI |
“Virtual desktop interface”, https://vdi.aalto.fi, from there you can |
Whole Internet |
Jupyter |
Since April 2024 Jupyter is part of Open OnDemand, see below. More info. |
See the corresponding OOD section |
Open OnDemand |
https://ondemand.triton.aalto.fi, Web-based interface to the cluster. Also known as OOD. Includes shell access, GUI, data transfer, Jupyter and a number of GUI applications like Matlab etc. More info. |
VPN and Aalto networks or through VDI |
Modules
See also: Software modules.
Command |
Description |
---|---|
|
load module |
|
list all modules |
|
search modules |
|
show prerequisite modules to this one |
|
list currently loaded modules |
|
details on a module |
|
details on a module |
|
unload a module |
|
save module collection to this alias (saved in |
|
list all saved collections |
|
details on a collection |
|
load saved module collection (faster than loading individually) |
|
unload all loaded modules (faster than unloading individually) |
Common software
See also: Applications.
Storage
See also: Data storage
Name |
Path |
Quota |
Backup |
Locality |
Purpose |
---|---|---|---|---|---|
Home |
|
hard quota 10GB |
Nightly |
all nodes |
Small user specific files, no calculation data. |
Work |
|
200GB and 1 million files |
x |
all nodes |
Personal working space for every user. Calculation data etc. Quota can be increased on request. |
Scratch |
|
on request |
x |
all nodes |
Department/group specific project directories. |
Local temp |
|
limited by disk size |
x |
single-node |
Primary (and usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node. |
Local persistent |
|
varies |
x |
dedicated group servers only |
Local disk persistent storage. On servers purchased for a specific group. Not backed up. |
ramfs (login nodes only) |
|
limited by memory |
x |
single-node |
Ramfs on the login node only, in-memory filesystem |
Remote data access
See also: Remote access to data.
Method |
Description |
---|---|
rsync transfers |
Transfer back and forth via command line. Set up ssh first.
|
SFTP transfers |
Operates over SSH. sftp://triton.aalto.fi in file browsers
(Linux at least), FileZilla (to |
SMB mounting |
Mount (make remote viewable locally) to your own computer. Linux: File browser, MacOS: File browser, same URL as Linux Windows: |
Partitions
Partition |
Max job size |
Mem/core (GB) |
Tot mem (GB) |
Cores/node |
Limits |
Use |
---|---|---|---|---|---|---|
<default> |
If you leave off all possible partitions will be used (based on time/mem) |
Use slurm partitions
to see more details.
Job submission
See also: Serial Jobs, Array jobs: embarassingly parallel execution, Parallel computing: different methods explained, Serial Jobs.
Command |
Description |
---|---|
|
submit a job to queue (see standard options below) |
|
Within a running job script/environment: Run code using the allocated resources (see options below) |
|
On frontend: submit to queue, wait until done, show output. (see options below) |
|
Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below) |
|
(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done. |
|
Cancel a job in queue |
|
(advanced) Allocate resources from frontend node. Use |
|
View/modify job and slurm configuration |
Command |
Option |
Description |
---|---|---|
|
|
time limit |
|
time limit, days-hours |
|
|
job partition. Usually leave off and things are auto-detected. |
|
|
request n MB of memory per core |
|
|
request n MB memory per node |
|
|
Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.) |
|
|
allocate minimum of n, maximum of m nodes. |
|
|
allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.) |
|
|
short job name |
|
|
print output into file output |
|
|
print errors into file error |
|
|
allocate exclusive access to nodes. For large parallel jobs. |
|
|
request feature (see |
|
|
Run job multiple times, use variable |
|
|
request a GPU, or |
|
|
request nodes that have disks, |
|
|
notify of events: |
|
|
whome to send the email |
|
|
|
Print allocated nodes (from within script) |
Command |
Description |
---|---|
|
Status of your queued jobs (long/short) |
|
Overview of partitions (A/I/O/T=active,idle,other,total) |
|
list free CPUs in a partition |
|
Show status of recent jobs |
|
Show percent of mem/CPU used in job. See Monitoring. |
|
Show GPU efficiency |
|
Job details (only while running) |
|
Show status of all jobs |
|
Full history information (advanced, needs args) |
Full slurm command help:
$ slurm
Show or watch job queue:
slurm [watch] queue show own jobs
slurm [watch] q show user's jobs
slurm [watch] quick show quick overview of own jobs
slurm [watch] shorter sort and compact entire queue by job size
slurm [watch] short sort and compact entire queue by priority
slurm [watch] full show everything
slurm [w] [q|qq|ss|s|f] shorthands for above!
slurm qos show job service classes
slurm top [queue|all] show summary of active users
Show detailed information about jobs:
slurm prio [all|short] show priority components
slurm j|job show everything else
slurm steps show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
slurm h|history show jobs finished since, e.g. "1day" (default)
slurm shares
Show nodes and resources in the cluster:
slurm p|partitions all partitions
slurm n|nodes all cluster nodes
slurm c|cpus total cpu cores in use
slurm cpus cores available to partition, allocated and free
slurm cpus jobs cores/memory reserved by running jobs
slurm cpus queue cores/memory required by pending jobs
slurm features List features and GRES
Examples:
slurm q
slurm watch shorter
slurm cpus batch
slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
Command |
Description |
---|---|
|
Full info on queues |
|
Advanced info on partitions |
|
List all nodes |
Slurm examples
See also: Serial Jobs, Array jobs: embarassingly parallel execution.
Simple batch script, submit with sbatch the_script.sh
:
#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --mem-per-cpu=1G
module load anaconda
python my_script.py
Simple batch script with array (can also submit with
sbatch --array=1-10 the_script.sh
):
#!/bin/bash -l
#SBATCH --array=1-10
python my_script.py --seed=$SLURM_ARRAY_TASK_ID
Hardware
See also: Cluster technical overview.
Node name |
Number of nodes |
Node type |
Year |
Arch ( |
CPU type |
Memory Configuration |
Infiniband |
GPUs |
Disks |
---|---|---|---|---|---|---|---|---|---|
pe[1-48,65-81] |
65 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
128GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[49-64,82] |
17 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
256GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[83-91] |
8 |
Dell PowerEdge C4130 |
2017 |
bdw avx avx2 |
2x14 core Xeon E5 2680 v4 2.40GHz |
128GB DDR4-2400 |
FDR |
900GB HDD |
|
skl[1-48] |
48 |
Dell PowerEdge C6420 |
2019 |
skl avx avx2 avx512 |
2x20 core Xeon Gold 6148 2.40GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
csl[1-48] |
48 |
Dell PowerEdge C6420 |
2020 |
csl avx avx2 avx512 |
2x20 core Xeon Gold 6248 2.50GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
milan[1-32] |
32 |
Dell PowerEdge C6525 |
2023 |
milan avx avx2 |
2x64 core AMD EPYC 7713 @2.0 GHz |
512GB DDR4-3200 |
HDR-100 |
No disk |
|
fn3 |
1 |
Dell PowerEdge R940 |
2020 |
avx avx2 avx512 |
4x20 core Xeon Gold 6148 2.40GHz |
2TB DDR4-2666 |
EDR |
No disk |
|
gpu[1-10] |
10 |
Dell PowerEdge C4140 |
2020 |
skl avx avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
gpu[11-17,38-44] |
14 |
Dell PowerEdge XE8545 |
2021, 2023 |
milan avx avx2 ampere a100 |
2x24 core AMD EPYC 7413 @ 2.65GHz |
503GB DDR4-3200 |
EDR |
4x A100 80GB |
440 GB SSD |
gpu[20-22] |
3 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 kepler |
2x6 core Xeon E5 2620 v3 2.50GHz |
128GB DDR4-2133 |
EDR |
4x2 GPU K80 |
440 GB SSD |
gpu[23-27] |
5 |
Dell PowerEdge C4130 |
2017 |
hsw avx avx2 pascal |
2x12 core Xeon E5-2680 v3 @ 2.5GHz |
256GB DDR4-2400 |
EDR |
4x P100 |
720 GB SSD |
gpu[28-37] |
10 |
Dell PowerEdge C4140 |
2019 |
skl avx avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
dgx[1-2] |
2 |
Nvidia DGX-1 |
2018 |
bdw avx avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 16GB |
7 TB SSD |
dgx[3-7] |
5 |
Nvidia DGX-1 |
2018 |
bdw avx avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 32GB |
7 TB SSD |
gpuamd1 |
1 |
Dell PowerEdge R7525 |
2021 |
rome avx avx2 mi100 |
2x8 core AMD EPYC 7262 @3.2GHz |
250GB DDR4-3200 |
EDR |
3x MI100 |
32GB SSD |
GPUs
See also: GPU computing.
Card |
Slurm feature name ( |
Slurm gres name ( |
total amount |
nodes |
architecture |
compute threads per GPU |
memory per card |
CUDA compute capability |
---|---|---|---|---|---|---|---|---|
Tesla K80* |
|
|
12 |
gpu[20-22] |
Kepler |
2x2496 |
2x12GB |
3.7 |
Tesla P100 |
|
|
20 |
gpu[23-27] |
Pascal |
3854 |
16GB |
6.0 |
Tesla V100 |
|
|
40 |
gpu[1-10] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
40 |
gpu[28-37] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
16 |
dgx[1-7] |
Volta |
5120 |
16GB |
7.0 |
Tesla A100 |
|
|
56 |
gpu[11-17,38-44] |
Ampere |
7936 |
80GB |
8.0 |
AMD MI100 (testing) |
|
Use |
gpuamd[1] |
Command line
See also: Linux shell crash course.
- General notes
The command line has many small programs that when connected, allow you to do many things. Only a little bit of this is shown here.
Programs are generally silent if everything worked, and only print an error if something goes wrong.
ls [DIR]
List current directory (or DIR if given).
pwd
Print current directory.
cd DIR
Change directory.
..
is parent directory,/
is root,/
is also chaining directories, e.g.dir1/dir2
or../../
nano FILE
Edit a file (there are many other editors, but
nano
is common, nice, and simple).mkdir DIR-NAME
Make a new directory.
cat FILE
Print entire contents of file to standard output (the terminal).
less FILE
Less is a “pager”, and lets you scroll through a file (up/down/pageup/pagedown).
q
to quit,/
to search.mv SOURCE DEST
Move (=rename) a file.
mv SOURCE1 SOURCE2 DEST-DIRECTORY/
copies multiple files to a directory.cp SOURCE DEST
Copy a file. The
DEST-DIRECTORY/
syntax ofmv
works as well.rm FILE ...
Remove a file. Note, from the command line there is no recovery, so always pause and check before running this command! The
-i
option will make it confirm before removing each file. Add-r
to remove whole directories recursively.head [FILE]
Print the first 10 (or N lines with
-n N
) of a file. Can take input from standard input instead ofFILE
.tail
is similar but the end of the file.tail [FILE]
See above.
grep PATTERN [FILE]
Print lines matching a pattern in a file, suitable as a primitive find feature, or quickly searching for output. Can also use standard input instead of
FILE
.du [-ash] [DIR]
Print disk usage of a directory. Default is KiB, rounded up to block sizes (1 or 4 KiB),
-h
means “human readable” (MB, GB, etc),-s
means “only of DIR, not all subdirectories also”.-a
means “all files, not only directories”. A common pattern isdu -h DIR | sort -h
to print all directories and their sizes, sorted by size.stat
Show detailed information on a file’s properties.
find [DIR]
find can do almost anything, but that means it’s really hard to use it well. Let’s be practical: with only a directory argument, it prints all files and directories recursively, which might be useful itself. Many of us do
find DIR | grep NAME
to grep for the name we want (even though this isn’t the “right way”, there are find options which do this same thing more efficiently).|
(pipe):COMMAND1 | COMMAND2
The output of
COMMAND1
is sent to the input ofCOMMAND2
. Useful for combining simple commands together into complex operations - a core part of the unix philosophy.>
(output redirection):COMMAND > FILE
Write standard output of
COMMAND
toFILE
. Any existing content is lost.>>
(appending output redirection):COMMAND >> FILE
Like above, but doesn’t lose content: it appends.
<
(input redirection):COMMAND < FILE
Opposite of
>
, input toCOMMAND
comes fromFILE
.type COMMAND
orwhich COMMAND
Show exactly what will be run, for a given command (e.g.
type python3
).man COMMAND-NAME
Browse on-line help for a command.
q
will exit,/
will search (it usesless
as its pager by default).-h
and--help
Common command line options to print help on a command. But, it has to be implemented by each command.
Triton quickstart guide
This is a quickstart guide to the Triton cluster. Each individual guide will link to additional resources with more extensive information.
Connecting to Triton
Most of the information on this page is also available on other tutorial sites. This page is essentially a condensed version of those sites, that will only give you a recipe how to quickly set up your machine and the most important details. For more in-depth information, please have a look at the linked pages for each section.
There are three suggested ways to connect to Triton, as detailed in the table below, with more info found at the connecting tutorial.
Method |
Description |
From where? |
---|---|---|
ssh from Aalto networks |
Standard way of connecting via command line. Hostname is
>Linux/Mac/Win from command line: >Windows: same, see Connecting via ssh for details options. |
VPN and Aalto networks (which is VPN, most wired,
internal servers, |
ssh (from rest of Internet) |
Use Aalto VPN and row above. If needed: same as above, but must set up SSH key and then |
Whole Internet, if you first set up SSH key AND also use passwords (since 2023) |
VDI |
“Virtual desktop interface”, https://vdi.aalto.fi, from there you can |
Whole Internet |
Jupyter |
Since April 2024 Jupyter is part of Open OnDemand, see below. More info. |
See the corresponding OOD section |
Open OnDemand |
https://ondemand.triton.aalto.fi, Web-based interface to the cluster. Also known as OOD. Includes shell access, GUI, data transfer, Jupyter and a number of GUI applications like Matlab etc. More info. |
VPN and Aalto networks or through VDI |
Get an account
First, you need to get an account.
Connecting via ssh
Prerequisites
This section assumes that you have a basic understanding of the linux shell,
you know know, what an ssh
key is, that you have an ssh
public/private
key pair stored in the default location and that you have some basic
understanding of the ssh config. If you lack either of these,
have a look at the following pages:
Setting up ssh for passwordless access
The following guide shows you how to set up the ssh system to allow you to connect to Triton from either outside of
the Aalto network or from within using an ssh key instead of your password. In the following
guide USERNAME
refers to your Aalto user name and ~/.ssh
refers to your ssh config folder.
(On Windows, you can use GIT-bash, which will allow
you to use linux style abbreviations. The actual folder is normally located under
C:\Users\currentuser\.ssh
, where currentuser is the name of the user).
First, create the file config
in the ~/.ssh
folder with the following content, or add
the following lines to it if it already exists. Instead of kosh
you can also use any other
remote access server (see Remote Access)
Host triton
User USERNAME
Hostname triton.aalto.fi
Host kosh
User USERNAME
Hostname kosh.aalto.fi
Host triton_via_kosh
User USERNAME
Hostname triton
ProxyJump kosh
Next, you have to add your public key to the authorized keys of both kosh and Triton.
For this purpose you have to connect to the respective servers and add your public key to
the authorized_keys
file in the servers .ssh/
folder.
# Connect and log in to kosh
ssh kosh
# Open the authorized_keys file and copy your public key.
nano .ssh/authorized_keys
# Copy your public key into this file
# to save the file press ctrl + x and the confirm with y
# afterwards exit from kosh
exit
Now you do the same for Triton by using our defined proxy jump over kosh.
# Connect and log in to kosh
ssh triton_via_kosh
# Open the authorized_keys file and copy your public key.
nano .ssh/authorized_keys
# Copy your public key into this file
# to save the file press ctrl + x and the confirm with y
# afterwards exit from Triton
exit
Now, to connect to Triton you can simply type:
ssh triton
# Or, if you are not on the aalto network:
ssh triton_via_kosh
Installing and running an X Server on Windows
This tutorial explains how to install an X-Server on Windows. We will use the VcXsrv, a free X-server for this purpose.
Steps:
Download the installer from here
Run the installer.
Select
Full
under Installation Options and clickNext
Select a target folder
To Run the Server:
Open the
XLaunch
program (most likely on your desktop)Select
Multiple Windows
and clickNext
Select
Start no client
and clickNext
On the
Extra settings
window, clickNext
On the
Finish configuration
page clickFinish
You have now started your X Server.
Set up your console
In the Git bash
or the windows command line (cmd
) terminal, before you connect to an ssh server, you have to set the used display.
Under normal circumstances, VcXsrv will start the Xserver as display 0.0. If for some reason the remote graphical user
interface does not start later on, you can check, the actual display by right-clicking on the tray-icon of the X Server
and select Show log
.
Search for DISPLAY
in the log file, and you will find something like:
DISPLAY=127.0.0.1:0.0
In your terminal enter:
set DISPLAY=127.0.0.1:0.0
Now you are set up to connect to the server of your choice via:
ssh -Y your.target.host
Notice, that on windows you will likely need the -Y
flag for X Server connections, since it seems -X
does not normally work.
Data on Triton
This section gives an best practices data usage, access and transfer to and from Triton.
Prerequisites
For data transfer, we assume that you have set up your system according to the instructions in the quick guide
Locations and quotas
Name |
Path |
Quota |
Backup |
Locality |
Purpose |
---|---|---|---|---|---|
Home |
|
hard quota 10GB |
Nightly |
all nodes |
Small user specific files, no calculation data. |
Work |
|
200GB and 1 million files |
x |
all nodes |
Personal working space for every user. Calculation data etc. Quota can be increased on request. |
Scratch |
|
on request |
x |
all nodes |
Department/group specific project directories. |
Local temp |
|
limited by disk size |
x |
single-node |
Primary (and usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node. |
Local persistent |
|
varies |
x |
dedicated group servers only |
Local disk persistent storage. On servers purchased for a specific group. Not backed up. |
ramfs (login nodes only) |
|
limited by memory |
x |
single-node |
Ramfs on the login node only, in-memory filesystem |
Access to data and data transfer
Prerequisites
On Windows systems, this guide assumes that you use GIT-bash,
and have rsync
installed according to this guide
Download data to Triton
To download a dataset directly to Triton, if it is available somewhere
online at a URL, you can use wget
:
wget https://url.to.som/file/on/a/server
If the data requires a login you can use:
wget --user username --ask-password https://url.to.som/file/on/a/server
Downloading directly to Triton allows you to avoid the unnecessary network traffic and time required to first download it to your machine and then transferring it over to Triton.
If you need to download a larger (>10GB) dataset to Triton from the internet please verify that the download actually succeeded properly. This can be done by comparing the md5 checksum (or others using e.g. sha256sum
and so on), commonly provided by hosts along with the downloadable data. The resulting checksum has to be identical to the one listed online. If it is not, your data is most likely corrupted and should not be used. After downloading simply run:
md5sum downloadedFileName
For very large datasets (>100GB) you should check, whether they are already on Triton. The folder for these kinds of datasets is located at:
/scratch/shareddata/dldata/
, and if not, please contact the admins to have it added there. This avoids the same dataset being downloaded multiple
times.
Copy data to and from Triton
The folders available on Triton are listed above. To copy small amounts of data to and from Triton from outside the Aalto network,
you can either use scp or on linux/mac mount the file-system using sftp (e.g. sftp://triton_via_kosh
).
From inside the Aalto network (or VPN), you can also mount the Triton file system via smb (More details can be found here):
scratch:
smb://data.triton.aalto.fi/scratch/
.work:
smb://data.triton.aalto.fi/work/$username/
.
For larger files, or folders with multiple files and if the data is already on your machine, we suggest using rsync (For more details on rsync have a look here):
# Copy PATHTOLOCALFOLDER to your Triton home folder
rsync -avzc -e "ssh" PATHTOLOCALFOLDER triton_via_kosh:/home/USERNAME/
# Copy PATHTOTRITONFOLDER from your Triton home folder to LOCALFOLDER
rsync -avzc -e "ssh" triton_via_kosh:/home/USERNAME/PATHTOTRITONFOLDER LOCALFOLDER triton_via_kosh:/home/USERNAME/
Best practices with data
I/O can be a limiting factor when using the cluster. The probably most important factor limiting I/O speed on Triton is file-sizes. The smaller the files the more inefficient their transfer. When you run a job on Triton and need to access many small files, we recommend to first pack them into a large tarball:
# To tar, and compress a folder use the following command
tar -zcvf mytarball.tar.gz folder
# To only bundle the data (e.g. if you want to avoid overhead by decompressing)
# a folder use the following command
tar -cvf mytarball.tar folder
copy them over to the node where your code is executed and extract them there within the slurm script or your code.
# copy it over
cp mytarball.tar /tmp
# and extract it locally
tar -xf /tmp/mytarball.tar
If each input file is only used once, it’s more efficient to load the tarball directly from the network drive. If it fits into memory, load it into memory, if not, try to use a sequentially reading input method and have the required files in the tar-ball in the required order. For more information on storage and data usage on Triton have a look at these documents:
Submitting jobs on Triton
Prerequisites
Optimally, before submitting a job: do enough tests and have a rough idea, how long your job takes, how much memory it needs and how much CPU(s)/GPU(s) it needs. Required Reading:
Required Setup:
Setting up your System to connect to Triton according to the connection guide
Your script and any data need to be on Triton (follow e.g. the data transfer quick-start guide)
Types of jobs:
Triton uses the Slurm scheduling system to allocate resources, like computer nodes, memory on the nodes, GPUs etc, to the submitted jobs. For more details on Slurm, have a look here. In this quickstart guide, we will only introduce the most important parameters, and skip over a lot of details. There are multiple different types of jobs available on Triton. Here we focus on the most commonly used ones.
Interactive jobs (commonly to test things or run graphical platforms with cluster resources)
Batch jobs (normal jobs submitted to the cluster without direct user input)
to run an interactive connect to Triton and job simply run
sinteractive
from the command line. You will then be connected to a free node, and can run your interactive session. More details can be found in the tutorial for interactive jobs. If you have a specific command that you want to run you can also use:
srun your_command
The most common job to run is a batch job, i.e. you submit a script that runs your code on the cluster.
To run this kind of job, you need a small script where you set parameters for the job and submit it to the cluster.
Using a script to set the parameters has the advantage that it is
easier to modify and reuse than passing the parameters on the command line.
A basic script (e.g. in the file BatchScript.slurm
) for a slurm batch job could look as follows:
#!/bin/bash
#SBATCH --time=04:00:00
#SBATCH --mem=2G
#SBATCH --output=ScriptOutput.log
module load scicomp-python-env
srun python /path/to/script.py
To run this script use the command sbatch BatchScript.slurm
.
So, let us go through this script:
#SBATCH --time=04:00:00
asks for a 4 hour time slot, after which the job will be stopped.#SBATCH --mem=2G
asks for 2Gb of memory for your job.#SBATCH --output=ScriptOutput.log
sets the terminal output of the job to the specified file.module load scicomp-python-env
tells the node you run on to load Python environment module.srun python /path/to/script
tells the cluster to run the command python /path/to/script.py
Most programming languages and tools have their own modules that need to be loaded before they can be run. You can get a list of available
modules by running module spider
. If you need a specific version of a module, you can check the available versions by running module spider MODULENAME
(e.g. module spider r
for R
). To load a specific version you have to specify this version during the load command (e.g. module load matlab/r2023b
for the 2023b release of MATLAB). For further details please have a look at the instructions for the specific application
There are plenty more parameters that you can set for the slurm scheduler as well (for a detailed list can be found here), but we are not going to discuss them in detail here, since they are likely not necessary for your first job.
Applications with graphical interface on Triton
One has two options: the recommended one is use ondemand.triton.alto.fi, though alternatively, one can run a job.
OOD
Using Open OnDemand (OOD) is essential and easy, login to https://ondemand.triton.aalto.fi. There on you will have several options, if the application you want to run is part of the Interactve Apps menu, then proceed from there. For instance Matlab, Paraview, R studio are there. In other case, launch a Triton Desktop and you recieve a normal Linux GUI environment of your choice (GNOME or alike).
It will be like you run a Linux Desktop on one of the Triton’s compute node with native
access to /scratch
and module ...
. Start a terminal within the session and proceed.
OOD is the recommended way.
Running a job
Not recommended, but still an option.
Prerequisites
Before submitting a job: Optimally, through tests, have a rough idea, how long your job takes, how much memory it needs and how much CPU(s)/GPU(s) it needs.
Required Reading:
Required Setup:
Setting up your system to connect to Triton according to the connection guide
Your script and any data need to be on triton (follow e.g. the data transfer quick-start guide)
Specific to Windows: Install an XServer
First off, in general, using graphical user interfaces to programming languages (e.g. graphical Matlab, or RStudio) is not recommended, since there is no real advantage to submitting a job to the cluster.
However, there are instances where you might need large amount of resources e.g. to visualize data which is indeed intended use. There are two things you need to do to run a graphical program on the cluster:
Start X-forwarding (when login to Triton, use
ssh -XY ...
)request an interactive job on the cluster (
sinteractive
)
Once you are on a node, you can load and run your program.
As for using various programming languages to run on Triton, one can see the following examples:
Getting Triton help
There are many ways to get help, and you should try them all. If you are just looking for the most important link, it is our issue tracker.
Whatever you do, these guidelines for making good support requests are very useful.
See also
Are you just looking for a Triton account? See Triton accounts.
Give enough information
We get many requests for help which are too vague to give a useful response. So, when sending us a question, always answer these questions and you’ll get the fastest useful response:
Has it ever worked? (If so, what has changed?)
What are you trying to accomplish? (Your ultimate goal, not current technical obstacle.)
What did you do? (Be specific enough to be reproducible - copy and paste exact commands you run, scripts, inputs, output messages, etc.)
What do you need? Do you need a complete solution, pointers to get started, or should we say if it will take too long and we recommend you think of other solutions first?
If you don’t know something, it’s OK, just do your best and we’ll help from there! You can also chat with us to brainstorm about issues in general. A much more detailed guide is available from Sigma2 documentation.
The Triton docs
In case you got to this page directly, you are now on the Triton and Science-IT (CS, NBE, PHYS at least) documentation site. See the main page for the index.
Your colleagues
Science is a collaborative process, even if it doesn’t seem so. Academic courses don’t teach you everything you need to know, so it’s worth trying to work together and learn from each other - your group is the expert in it’s work, after all.
Daily garage
Come by one of the online Scientific computing garages any day at 13:00. It’s the best place to get problems solved fast - chat with us and see.
Issue tracker
We keep track of cluster issues at https://version.aalto.fi/gitlab/AaltoScienceIT/triton/issues. Feel free to post your issue there. Either admins or other users can reply — and you should feel free to reply and help others, too. The system is accessible from anywhere in the world, but you need to login with HAKA (using the button). All newly created issues are reported to admins by email.
This is primary support channel and meant for general issues like general help, troubleshooting, problems with code, new software requests, problems that may affect several users.
Note
If you get a message that you are blocked from version.aalto.fi, send the email to servicedesk. It’s not your fault: it automatically blocks people when their organizational unit changes. Yes, this is bad but it’s not in our control…
If you have an Aalto visitor account, login with HAKA won’t work - use your email address and Aalto password.
Email ticketing system
For private issues you can also contact us via our email alias (on our wiki pages, login required). This is primarily intended for specific issues such as requesting new accounts, quotas, etc. Please avoid sending personal mails directly to admins, because it is best for all admins to be aware of issues, people may be absent, and personal emails are likely to be lost.
Most general issues should be reported to the issue tracker instead, not by email. Email is primarily for accounts related queries.
Research Software Engineers
Sometimes, a problem goes beyond “Triton support” and becomes “Research support”. Our Research Software Engineers are perfect for these kinds of problems: they can program with you, set up your workflow, or even handle all the technical problems for you.
Users’ mailing list
All cluster users are on the triton-users mailing list (automagically kept in sync with those who have Triton access). It is for announcements and open discussions mainly, for problem solving please try the tracker.
If you do not receive list emails, you’d better check out with your local Triton admin that you are on the list. Otherwise you miss all the announcements including critical ones about maintenance breaks.
Triton support team
Most of us are members of your department’s support teams, so can answer questions about balancing use of Triton and your department’s computers. We also like it when people drop by and talk with us, so that we can better plan our services. In general, don’t mail us directly - use either the issue tracker above or the support email address. You can address your request to a specific person.
Dept |
Name |
Location |
---|---|---|
CS/NBE |
Mikko Hakala |
T-building A243 / Otakaari 3, F354 |
CS |
Simo Tuomisto |
T-building A243 |
PHYS |
Simppa Äkäslompolo |
Kide, 2512 |
PHYS |
Ivan Tervanto |
Kide, 2512 |
CS/SCI |
Richard Darst |
T-building A243 |
NBE |
Enrico Glerean |
Otakaari 3, F354 |
Science-IT trainings
We have regular training in topics relevant to HPC and scientific computing. In particular, each January and June we have a “kickstart” course which teaches you everything you need to know to do HPC work. Each Triton user should come to one of these. For the schedule, see our training page.
Getting a detailed bug report with triton-record-environment
We have a script named triton-record-environment
which will record
key environment variables, input, and output. This greatly helps in
debugging.
To use it to run a single command that gives an error:
triton-record-environment YOUR_COMMAND
Saving output to record-environment.out.txt
...
Then, just check the output of record-environment.out.txt
(it
shouldn’t have any confidential information, but make sure) and send
it to us/attach it to the bug report.
If you use Python, add the -p
option, matlab should use -m
,
and graphical programs should use -x
(these options have to go
before the command you execute).
Frequently asked questions
Job status and submission
None
Accounts are limited in how much the can run at a time, in order to prevent a single or a few users from hogging the entire cluster with long-running jobs if it happens to be idle (e.g. after a service break). The limit is such that it limits the maximum remaining runtime of all the jobs of a user. So the way to run more jobs concurrently is to run shorter and/or smaller (less CPU’s, less memory) jobs. For an in-depth explanation see http://tech.ryancox.net/2014/04/scheduler-limit-remaining-cputime-per.html and for a graphical simulator you can play around with: https://rc.byu.edu/simulation/grpcpurunmins.php . You can see the exact limits of your account with
sacctmgr -s show user $USER format=user,account,grptresrunmins%70
None
Slurm is configured such that if a job fails due to some outside reason (e.g. the node where it’s running fails rather than the job itself crashing due to a bug in the job) the job is requeued in a held state. If you’re sure that everything is ok again you can release the job for scheduling with “scontrol release JOBID”. If you don’t want this behavior (i.e. you’d prefer that such failed jobs would just disappear) then you can prevent the requeuing with
#SBATCH --no-requeue
None
This happens when a job is submitted to multiple partitions (this
is the default: it tries to go to partitions of all node types) and
it is BadConstraints for some partitions. Then, it gives the
BadConstraints reason for the whole job, even though it will
eventually run. (If constraints are bad in all partitions, it will
usually fail right when you are trying to submit it, something like
sbatch: error: Batch job submission failed: Requested node
configuration is not available
).
You don’t need to do anything, but if you want a clean status: you
can get rid of this message by limiting to partitions that
actually satisfy the constraints. For example, if you request 96
CPUs, you can limit to the Milan nodes with -p batch-milan
since those are tho only nodes with more than 40 CPUs. This
example is valid as of 2023, if you are reading this later you need
to figure out what the current state is (or ask us).
None
You can find out the remaining time of any job that is running with
squeue -h -j -o %L
Inside a job script or sinteractive session you can use the environment variable SLURM_JOB_ID to refer to the current job ID.
None
SLURM kills jobs based on the partition’s TimeLimit + OverTimeLimit
parameter. The later in our case is 60 minutes. If for instance queue
time limit is 4 hours, SLURM will allow to run on it 4 hours, plus 1
hour, thus no longer than 5 hours. Though OverTimeLimit may vary, don’t
rely on it. Partition’s (aka queue’s) TimeLimit is the one that end user
should take into account when submit his/her job. Time limits per
partiton one can check with slurm p
command.
For setting up exact time frame after which you want your job to be
killed anyway, set --time
parameter when submitting the job. When
the time limit is reached, each task in each job step is sent SIGTERM
followed by SIGKILL. If you run a parallel job, set --time
with
srun
as well. See ‘man srun'
and ‘man sbatch
’ for details.
#SBATCH --time=1:00:00
...
srun --time=1:00:00 ...
None
You have requested some Slurm options which do not include any
nodes (for example, asking for a GPU with --gpus=TYPE:N
and a
partition without GPUs). Figure out what the problem is and adjust
your Slurm options.
None
This error usually occurs when a requested node is down, drained or reserved which can happen if the cluster is undergoing some work - and might happen if there are very few default nodes that Slurm chooses from. If this error occurs then the shell will usually hang after the job has been submitted if the job is still waiting for allocation. To find which nodes are available for us to run jobs we can use sinfo
and under the STATE
column you will see for each partition the states of the nodes.
To fix this we can either wait for the node to be available or choose a different partition with the --partition=
command, using one of the partitions from sinfo
which has free and available (idle
) nodes.
Accounts and Access to triton
None
Remote mounting
The scratch filesystem can be mounted from inside the Aalto networks
by using smb://data.triton.aalto.fi/scratch/
. For example, from
Nautilus (the file manager) on Ubuntu, use “File” -> “Connect to
server”. Outside Aalto networks, use the Aalto VPN. If it is not an
Aalto computer, you may need to us AALTO\username
as the username,
and your Aalto password.
Or you can use sshfs
– filesystem client based on SSH. Most Linux workstations
have it installed by default, if not, install it or ask your local IT
support to do it for you. For setting up your SSHFS mount from your
local workstation: create a local directory and mount remote directory
with sshfs
$ mkdir /LOCALDIR/triton
$ sshfs user1@triton.aalto.fi:/triton/PATH/TO/DIR /LOCALDIR/triton
Replace user1
with your real username and /LOCALDIR
with
a real directory on your local drive. After successful mount, use you
/LOCALDIR /triton
directory as it would be local. To unmount it,
run fusermount -u /LOCALDIR/triton
.
PHYS users example, assuming that Triton and PHYS accounts are the same:
$ mkdir /localwrk/$USER/triton
$ sshfs triton.aalto.fi:/triton/tfy/work/$USER /localwrk/$USER/triton
$ cd /localwrk/$USER/triton
... (do what you need, and then unmount when there is no need any more)
$ fusermount -u /localwrk/$USER/triton
Easy access with Nautilus
The SSHFS method described above works from any console. Though in case
of Linux desktops, when one has a GUI like Gnome or Unity (read all
Ubuntu users) one may use Nautilus – default file manager – to mount
remote SSH directory. Click File -> Connect to Server
choose
SSH
, input triton.aalto.fi as a server and directory
/triton/PATH/TO/DIR
you’d like to mount, type your name. Leave
password field empty if you use SSH key. As soon as Nautilus will
establish connection it will appear on the left-hand side below Network
header. Now you may access it as it would be your local directory. To
keep it as a bookmark click on the mount point and press Ctrl+D
, it
will appear below Bookmark header on the same menu.
Copying files
If your workstatios has no NFS mounts from Triton (CS and NBE have, consult with your local admins for exact paths), you may always use SSH. Either copy your files from triton to a local directory on your workstation, like:
$ sftp user1@triton.aalto.fi:/triton/path/to/dir/* .
None
Let’s say you have some server (e.g. debugging server, notebook server, …) running on a node. As usual, you can do this with ssh using port forwarding. It is the same principle as in several of the above questions.
For example, you want to connect from your own computer to port AAAA
on node nnnNNN
. You run this command:
ssh -L BBBB:nnnNNN:AAAA username@triton.aalto.fi
Then, when you connect to port BBBB
on your own computer
(localhost
, it gets forwarded straight to port AAAA
on node
nnnNNN
. Thus only one ssh connection gets us to any node. It is
possible for BBBB
to be the same as AAAA
. By the way, this works
with any type of connection. The node has to be listening on any
interface, not just the local interface. To connect to
localhost:AAAA
on a node, you need to repeat the above steps twice
to forward from workstation->login and login->node, with the second
nnnNNN
being localhost
.
None
In order for graphical programs on Linux to work, a file
~/.Xauthority
has to be written. If your home directory quota
(check with quota
) is exceeded, then this can’t be written and
graphical programs can’t open. If your quota is exceeded, clean up
some files, close connections, and log in again. You can find where
most of your space goes with du -h $HOME | sort -hr | less
.
This is often the case if you get X11 connection rejected because of
wrong authentication
.
Storage, file transfer and quota
None
Main article: Triton Quotas
Everyone should have a group quota, but no user quota. All files need to be in a proper group (either a shared group with quota, or your “user private group”). First of all, use the ‘quota’ command to make sure that neither disk space nor number of files are exceeded. Also, make sure that you use $WRKDIR for data and not $HOME. If you actually need more quota, ask us.
Solution: add to your main directory and all your subdirectories to
the right group, and make sure all directories have the group s-bit set,
(SETGID bit, see man chmod
). This means “any files created within
this directory get the directory’s group”. Since your default group is
“domain users” which has no quota, if the s-bit is not set, you get an
immediate quota exceeded by default.
# Fix everything
# (only for $WRKDIR or group directories, still in testing):
/share/apps/bin/quotafix -sg --fix /path/to/dir/
# Manual fixing:
# Fix sticky bit:
lfs find $WRKDIR -type d --print0 | xargs -0 chmod g+s
# Fix group:
lfs find /path/to/dir ! --group $GROUP -print0 | xargs -0 chgrp $GROUP
Why this happens: $WRKDIR directory is owned by the user and user’s group that has the same name and GID as UID. Quota is set per group, not per user. That is how it was implemented since 2011 when we got Lustre in use. Since spring 2015 Triton is using Aalto AD for the authentication which sets everyone a default group ID to ‘domain users’. If you copy anything to $WRKDIR/subdirectory that has no +s bit you copy as a ‘domain users’ member and file system refuses to do so due to no quota available. If g+s bit is set, all your directories/files copied/created will get the directory’s group ownership instead of that default group ‘domain users’. There can be very confusing interactions between this and user/shared directories.
None
Most likely your Kerberos ticket has expired. If you log in with a password or use ‘kinit’, you can get an another ticket. See page on data storage and remote data for more information.
None
It is an extension of the previous question. In case you are outside of Aalto and has neither direct access to Triton nor access to NFS mounted directories on your directory servers. Say you want to copy your Triton files to your home workstation. It could be done by setting up an SSH tunnel to your department SSH server. A few steps to be done: set tunnel to your local department server, then from your department server to Triton, and then run any rsync/sftp/ssh command you want from your client using that tunnel. The tunnel should be up during whole session.
client: ssh -L9509:localhost:9509 department.ssh.server
department server: ssh -L9509:localhost:22 triton.aalto.fi
client: sftp -P 9509 localhost:/triton/own/dir/* /local/dir
Note that port 9509 is taken for example only. One can use any other available port. Alaternatively, if you have a Linux or Mac OS X machine, you can setup a “proxy command”, so you don’t have to do the steps above manually everytime. On your home machine/laptop, in the file ~/.ssh/config put the lines
Host triton
ProxyCommand /usr/bin/ssh DEPARTMENTUSERNAME@department.ssh.server "/usr/bin/nc -w 10 triton.aalto.fi 22"
User TRITONUSERNAME
This creates a host alias “triton” that is proxied via the department server. So you can copy a file from your home machine/laptop to triton with a command like:
rsync filename triton:remote_filename
None
Most probably your quota has exceeded, check it out with quota
command.
quota
is a wrapper at /usr/local/bin/quota
on front end which
merges output from classic quota utility that supports NFS and Lustre’s
lfs quota
. NFS $HOME
directory is limited to 10GB for everyone
and intended for initialization files mainly. Grace period is set to 7
days and “hard” quota is set to 11GB, which means you may exceed your
10GB quota by 1GB and have 7 days to go below 10GB again. However none
can exceed 11GB limit.
Note: Lustre mounted under /triton
is the right place for your
simulation files. It is fast and has large quotas.
None
Short answer: yes for $HOME directory and no for $WRKDIR.
$HOME
.$WRKDIR
(aka /triton
) is fast Lustre, has large quota, mounted
through InfiniBand. Though no backups made from /triton
, the DDN
storage system as such is secure and safe place for your data, though
you can always loose your data deleting them by mistake. Every user
must take care about his work files himself. We provide as much
diskspace to every user, as one needs and the amount of data is
growing rapidly. That is the reason why the user should manage his
important data himself. Consider backups of your valuable data on
DVDs/ USB drives or other resources outside of Triton.Command line interface
None
Yes. Change shell to your Aalto account and re-login to Triton to get
your newly changed shell to work. For Aalto account changes one can
login to kosh.aalto.fi, run kinit
first and then run chsh
, then
type /bin/bash. To find out what is your current shell, run
echo $SHELL
For the record: your default shell is not set by Triton environment but by your Aalto account.
None
That is made intentionally due to high load on Lustre filesystem. Being
a high performance filesystem Lustre still has its own bottlenecks, and
one of the common Lustre troublemakers are ls -lr
or ls --color
which generate lots of requests to Lustre meta servers which regular
usage by all users may get whole system in stuck. Please follow the
recommendations given at the last section at Data storage on the Lustre
file system
None
This happens because your computer is sending the “locale”
information (language, number format, etc) to the other computer
(Triton), but Triton doesn’t know the one on your computer. You can
unset/adjust all the LC_*
and/or LOCALE
environment
variables, or in your .ssh/config
, try setting the following in
your Triton section (see SSH for info on how this
works, you need more than you see here):
Host triton
SendEnv LC_ALL=C
env | grep LC_
and env | grep LANG
might give you hints
about exactly what environment variables are being sent from your
computer (and thus you should override in the ssh config file).
Modules and environment settings
None
You have included ‘module load module/name’ but job still fails due to
missing shared libraries or that it can not find some binary etc. That
is a known ZSH related issue. In your sbatch script please use -l
option (aka --login
) which forces bash to read all the
initialization files at /etc/profile.
#!/bin/bash -l
...
Alternatively, one can change shell from ZSH to BASH to avoid this hacks, see the post above.
None
Indeed the default git with Triton OS system (CentOS) is quite old (v 1.8.x).
To get a more modern git you can run module load git
(version 2.28.0 when this is being written).
Coding and Compiling
None
You are trying to run a GPU program (using CUDA) on a node without a
GPU (and thus, no libcuda.so.1
. Remember to specify that
you need GPUs
None
Few recommendations about CPU number:
benchmark your applications on different number of CPU cores 1, 2, 12, 24, 36, and larger. Check out with the developers, your application may have ready scalability benchmarks and recommendations for compiler, MPI libraries choice.
benchmark on shared memory i.e. up to 12 CPU cores within one node and then on different nodes (distributed memory): involving interconnect make have huge difference
if you are not sure about program scalability and you have no time for testing, don’t run on more than 12 CPU cores within one node
be considerate! it is not you against others! do not try to fill up the cluster just for being cool
None
Currently there are two different sets of compilers: (i) GNU compilers, native for Linux, installed by default, (ii) Intel compilers plus MKL, a commercial suite, often the fastest compiler on Xeons.
FGI provides all FGI sites with 7 Intel licenses, thus only 7 users can compile/link with Intel at once.
None
Background: Compiled code has dynamic libraries. When a program
runs, it needs to load that code. The code embeds the name of the
library like libc.so.6
and then when it runs, it uses built-in
paths (/etc/ld.so.conf
) and the LD_LIBRARY_PATH
environment
variable. It takes the first thing it finds and loads it.
In all of these cases, they work in the fine line between the operating system, software we have installed, and software you have installed. Have a very low threshold to ask for help by coming to our daily garage with your problem. We might have a much easier solution much faster than you con figure out.
Problem 1: Library not found: In this case, something expects a certain library, but it can’t be found. Possible solutions could include:
Loading a module that provides the library (did you have a module loaded when you compiled the code? Are you installing a Python/R extension that needs a library from outside?)
Setting the
LD_LIBRARY_PATH
variable to point to the library. If you have self-complied things this might be appropriate, but it might also be a sign that something else is wrong.
Problem 2: library version not found (such as GLIBC_2.29 not
found
): This usually means that it’s finding a library, loading
it, but the version is too old. This especially happens on
clusters, where the operating system can’t change that often.
If it’s about
GLIBCXX_version
, and you canmodule load gcc
of a proper version, or if you are in a conda environment, install thegcc
package to bring.If it’s about
GLIBC
, then it’s about the base C librarylibc
, and that is very hard to solve, since this is intrinsically connected to the operating system. Likely, the program is compiled on an operating system too new for the cluster and you’d think about re-compiling on the cluster, putting it in a container.Setting
LD_LIBRARY_PATH
might help to direct to a proper version. Again, this probably indicates some other problem.
Problem 3: you think you have the newer library loaded by a
module or something, but it’s still giving a version error: This
has sometimes happened with programs that use extensions. The base
program uses is older version of the library, but an extension
needs a newer version. Since the base program has already loaded
an older version, even specifying the new version via
LD_LIBRARY_PATH
doesn’t help much.
Solution: this is tricky, since the program should be using the never version if it’s on
LD_LIBRARY_PATH
already. Maybe it’s hard-coded to use a particular older version? In this case, since it’s hard-coded to an old version, maybe you need a newer version of the base program itself (an example of this was an R extension that expected a newerGLIBCXX_version
: the answer was to build Triton’s R module with a newergcc
compiler version). If you get this case, you should be asking us to take a look.
None
One can use both, though for shared libs all your linked libs must be
either in your $WRKDIR
in /shared/apps
or must be installed by
default on all the compute nodes like vast majority of GCC and other
default Linux libs.
None
Use file
utility:
# file /usr/bin/gcc
/usr/bin/gcc: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV),
for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped
it displays the type of an executable or object file.
Other issues
None
We don’t have local department printers configured anywhere on Triton. But one can use SSH magic to send a file or command output to a remote printer. Run from your local workstation, insert the target printer name:
... printing text file
$ ssh user@triton.aalto.fi "cat file.txt" | enscript -P printer_name
... printing a PostScript file
$ ssh user@triton.aalto.fi "cat file.ps" | lp -d printer_name -
... printing a man page
$ ssh user@triton.aalto.fi "man -t sbatch" | lp -d printer_name -
None
Having a user account on Triton also means being on the triton-users at aalto.fi mailist. That is where support team sends all the Triton related announcements. All the Triton users MUST be subscibed to the list. It is automatically kept up to date these days, but just in case you are not yet there, please send an email to your local team member and ask to add your email.
How to unsubscribe? You will be removed from the maillist as soon as your Triton account is deleted from the system. Otherwise no way, since we can’t notify about urgent things that affect data integrity or other issues.
None
All the hardware delivered by the vendor has been labeled with some short name. In particular every single compute node has a label like Cn01 or GPU001 etc. we used this notation to name compute nodes, that is cn01 is just a hostname for Cn01, gpu001 is a hostname for GPU001 etc. Shorthands like cn[01-224] mean all the hostnames in the range cn01, cn02, cn03 .. cn224. Same for gpu[001-008], tb[003-008], fn[01-02]. Similar notations can be used with SLURM commands like:
$ scontrol show node cn[01-12]
None
Check your .bashrc
and other startup files. Some modules bring
in so many dependencies that it can interfere with standard
operating system functions: in this case, SSH setting up X11
forwarding for graphical applications.
Cluster technical overview
Hardware
Node name |
Number of nodes |
Node type |
Year |
Arch ( |
CPU type |
Memory Configuration |
Infiniband |
GPUs |
Disks |
---|---|---|---|---|---|---|---|---|---|
pe[1-48,65-81] |
65 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
128GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[49-64,82] |
17 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 |
2x12 core Xeon E5 2680 v3 2.50GHz |
256GB DDR4-2133 |
FDR |
900GB HDD |
|
pe[83-91] |
8 |
Dell PowerEdge C4130 |
2017 |
bdw avx avx2 |
2x14 core Xeon E5 2680 v4 2.40GHz |
128GB DDR4-2400 |
FDR |
900GB HDD |
|
skl[1-48] |
48 |
Dell PowerEdge C6420 |
2019 |
skl avx avx2 avx512 |
2x20 core Xeon Gold 6148 2.40GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
csl[1-48] |
48 |
Dell PowerEdge C6420 |
2020 |
csl avx avx2 avx512 |
2x20 core Xeon Gold 6248 2.50GHz |
192GB DDR4-2667 |
EDR |
No disk |
|
milan[1-32] |
32 |
Dell PowerEdge C6525 |
2023 |
milan avx avx2 |
2x64 core AMD EPYC 7713 @2.0 GHz |
512GB DDR4-3200 |
HDR-100 |
No disk |
|
fn3 |
1 |
Dell PowerEdge R940 |
2020 |
avx avx2 avx512 |
4x20 core Xeon Gold 6148 2.40GHz |
2TB DDR4-2666 |
EDR |
No disk |
|
gpu[1-10] |
10 |
Dell PowerEdge C4140 |
2020 |
skl avx avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
gpu[11-17,38-44] |
14 |
Dell PowerEdge XE8545 |
2021, 2023 |
milan avx avx2 ampere a100 |
2x24 core AMD EPYC 7413 @ 2.65GHz |
503GB DDR4-3200 |
EDR |
4x A100 80GB |
440 GB SSD |
gpu[20-22] |
3 |
Dell PowerEdge C4130 |
2016 |
hsw avx avx2 kepler |
2x6 core Xeon E5 2620 v3 2.50GHz |
128GB DDR4-2133 |
EDR |
4x2 GPU K80 |
440 GB SSD |
gpu[23-27] |
5 |
Dell PowerEdge C4130 |
2017 |
hsw avx avx2 pascal |
2x12 core Xeon E5-2680 v3 @ 2.5GHz |
256GB DDR4-2400 |
EDR |
4x P100 |
720 GB SSD |
gpu[28-37] |
10 |
Dell PowerEdge C4140 |
2019 |
skl avx avx2 avx512 volta |
2x8 core Intel Xeon Gold 6134 @ 3.2GHz |
384GB DDR4-2667 |
EDR |
4x V100 32GB |
1.5 TB SSD |
dgx[1-2] |
2 |
Nvidia DGX-1 |
2018 |
bdw avx avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 16GB |
7 TB SSD |
dgx[3-7] |
5 |
Nvidia DGX-1 |
2018 |
bdw avx avx2 volta |
2x20 core Xeon E5-2698 v4 @ 2.2GHz |
512GB DDR4-2133 |
EDR |
8x V100 32GB |
7 TB SSD |
gpuamd1 |
1 |
Dell PowerEdge R7525 |
2021 |
rome avx avx2 mi100 |
2x8 core AMD EPYC 7262 @3.2GHz |
250GB DDR4-3200 |
EDR |
3x MI100 |
32GB SSD |
All Triton computing nodes are identical in respect to software and access to common file system. Each node has its own unique host name and ip-address.
Networking
The cluster has two internal networks: Infiniband for MPI and Lustre
filesystem and Gigabit Ethernet for everything else like NFS /home
directories and ssh.
The internal networks are unaccessible from outside. Only the login node
triton.aalto.fi
has an extra Ethernet connection to outside.
High performance InfiniBand has fat-tree configuration in general. Triton
has several InfiniBand segments (often called islands) distinguished based
on the CPU arch. The nodes within those islands connected with different
ratio like 2:1, 4:1 or 8:1, (i.e. in 4:1 case for each 4
downlinks there is 1 uplink to spine switches. The islands are
ivb[1-45]
540 cores, pe[3-91]
2152 cores
(keep in mind that pe[83-91]
have 28 cores per node), four c[xxx-xxx]
segments
with 600 cores each, skl[1-48] and csl[1-48] with 1920 cores each [CHECKME]. Uplinks from
those islands are mainly used for Lustre communication.
Running MPI jobs possible on the entire island or its segment, but not
across the cluster.
Disk arrays
All compute nodes and front-end are connected to DDN SFA12k storage
system:
large disk arrays with the Lustre filesystem on top of it cross-mounted
under /scratch
directory. The system provides about 1.8PB of disk
space available to end-user.
Software
The cluster is running open source software infrastructure: CentOS 7, with SLURM as the scheduler and batch system.
Usage policies and legal
Triton intended use cases
Triton accounts should be used for scientific research (anything sponsored by a research PI that is aligned with Aalto’s mission), including employed researchers, masters, PhD theses, and others in a department staff database. In practice, anyone with a researcher, employee (not TAs), or visitor job title gets an account made immediately after they accept the terms of use. For others, we ask their research supervisor for confirmation before making an account.
If you have an account, exploring computing to learn new skills or test out new is OK if it doesn’t use large amounts of resources. We don’t mind other employees using it for lightweight computing tasks or experimentation, but we can help you find the best place if it gets large.
Common exceptions:
Undergraduate students doing projects which can’t be considered research should use resources for students.
Courses should generally not recommend Triton as a resource. Certain courses with a special reason (e.g. HPC courses) may use it after special agreement between the instructor and Triton staff.
Student projects which are research projects with a research PI may use Triton.
Acceptable Use Policy and Terms of Service
By using the Triton cluster resources, you shall be deemed to accept these conditions of use:
You shall only use Triton cluster to perform work, or transmit or store data consistent with the stated goals and policies of Aalto University and in compliance with these conditions of use.
You shall not use Triton cluster for any unlawful purpose and not (attempt to) breach or circumvent any administrative or security controls. You shall respect copyright and confidentiality agreements and protect your credentials (e.g. user login name, password, ssh private key), sensitive data and files.
You shall immediately report any known or suspected security breach or misuse of Triton cluster or credentials to the cluster support team.
Use of the cluster is at your own risk. There is no guarantee that the cluster will be available at any time or that it will suit any purpose.
Logged information, including information provided by you for registration purposes, shall be used for administrative, operational, accounting, monitoring and security purposes only in accordance with the policy below. This information may be disclosed to other organizations anywhere in the world for these purposes in the extent allowed by local laws. Although efforts are made to maintain confidentiality, no guarantees are given.
The cluster support team is entitled to regulate and terminate access for administrative, operational and security purposes and you shall immediately comply with their instructions.
You are liable for the consequences of any violation by you of these conditions of use.
You agree to explicitly mention and acknowledge the use of Science-IT resources in your work in any reports, workshops, papers or similar that result from such usage. Appropriate reference can be found at Acknowledgement of Triton usage.
Triton data (privacy) policy
Triton is a part of Aalto University IT systems, thus is fundamentally governed by the Aalto Privacy Policy for Employees or Privacy Policy for Students, the latest versions of which can always be found on aalto.fi.
For clarity, in this section, we describe the special cases of Triton data:
In summary:
The Triton account is not a separate account, it is part of the Aalto account. We do not control that.
Triton usage statistics and logs. Triton is used for university academic research only, so this information may be used for reporting and management in any way. Identifying information won’t be public, but note that it will be used for internal operations and contacting users as needed.
Data stored on Triton. We are not the controller of this data. Data in your personal directories is controlled by you, and data in shared directories is controlled by the manager of that group. See the section below for more information on this data.
HAKA login data (JupyterHub only). This is used to secure access to JupyterHub. Only your Aalto account name is requested, it is compared and immediately discarded (Triton is already linked to your Aalto account).
The triton-users mailing list is automatically formed from all Aalto accounts in the triton-users group (everyone with an account). This is used to send service announcement and information related to scientific computing. This subscription is intrinsically tied to the Triton account and a requirement of the cluster usage. (Email information held by Aalto IT services).
We do not consider the Triton management data to consist of a personal data file (this is covered under Aalto policies), but for full disclosure we describe our use of data.
Note about research data: This section does not cover any data which users store on the cluster: for that, the user is the controller and Science-IT is only a processor. You are responsible for any administrative privacy matters. The following subsections relate only to administrative metadata.
Controller and contact
Controller: Aalto Science-IT, Aalto University, Espoo, Finland. Contact information. Please use the support email alias for account and personal data queries.
Account information comes from Aalto ITS registers.
The purpose for processing the personal data
Data is processed and stored in accordance with our agreement to provide a HPC cluster service including accounting and reporting, in accordance with the usage agreement. The cluster may only be used to support Aalto (not personal) activities, and all thus usage metadata represents Aalto activities and is owned by Aalto University.
Types of data
Triton stores the information necessary for provision of its services, including accounting, funding, and security. This includes logs of all operations and metadata of stored data. Data is only generated when a users uses the cluster. For example (including but not limited to):
Connection logs
Job submission and statistics logs
Filesystem and storage metadata and logs
Uses of data
Data is used in the provision of the HPC cluster service. Primarily, this is through accounting, reporting, and scheduling of tasks. Historical data will automatically adjust future cluster priority.
Sources of information
Data is produced during the use of Triton for research purposes. This data is generated directly by users while using the cluster. Account information is provide by Aalto University, and in general not stored or processed here.
Data sharing
Data may be used for internal Aalto reporting and accounting (usually but not always aggregated at least at the group level), and used in non-identifiable forms in public reports and statistics. It may also be used as needed to investigate usage matters.
All users of the cluster may inspect the usage information and job statistics of the entire cluster (including all other users).
Timeframe
Data related to usage remains as long as the user has an active Triton account. Technical logging data allows accounting and reporting, and may be kept as long as needed for security and reporting purposes (indefinitely). Where possible, this may be in anonymous form.
Legal notices
Data is stored in Finland in Aalto or CSC approved facilities. Access is only via Aalto account.
You may request rectification of your data. However, most data is technical logging information which can not be removed or changed.
You may cease using the cluster, remove your research data, and request your account be closed (this does not close your Aalto account because we do not control that), but historical usage data will remain for accounting purposes. Should technical errors in data be identified, a bug should be reported.
You may access and extract your own data using the standard interfaces described in the user guide.
Identifiable administrative metadata and accounting data is not transferred outside of the EU/EEA except under proper agreement. (We have to say that, but in reality identifiable data is never transferred out of Aalto or maybe the FGCI consortium in Finland).
You may lodge a complaint with the Aalto data protection officer (see Aalto privacy notices for up to date contact information) or the Finnish supervision authority Tietosuoja.
Research and home data stored on cluster
We provide a storage service for for data stored on the cluster (scratch and home directories):
Our responsibility is limited to keeping this data secure and
providing access to the corresponding Aalto accounts. The shared
directory manager should be able to make choices about data. We do
not access this data except with an explicit request, but for
management purpose we do use the file metadata (stat $filename
).
For full information, see the Science-IT data policy.
We do not look into private files without your explicit request (if you want help with something, explicitly tell us if we can look at them).
If your files are made cluster-readable (the
chmod
“other” permissions), you give permission for others to look at contents. Note that this is not the default setting.Should you report a problem, we may run
stat
as superuser on relevant files to determine basic metadata without further checks.Should you have a problem that requires us to look at the contents of files or directories, we must first have your explicit permission (either in writing or in person)
User-owned data (home directories, work directories) may be deleted six months after an account expires. Use a group-based storage space instead.
Our data storage service is suitable for confidential data. You must ensure that permissions are such that technical access is limited.
Acknowledging Triton
Acknowledgement line
Triton and Science-IT gets funding from departments and Aalto, so it is critical that we show them the results of our work. Thus, please note that if you use the cluster for research that is published or presented in a talk or poster form you must acknowledge the Science-IT project by School of Science, that funds the Triton and affiliated resources. By published work we mean everything like articles, doctoral theses, diplomas, reports, other relevant publications. Use of triton can be anything: CPUs, GPUs, or the storage system (note that the storage system is the “scratch” system, which is cross-mounted to several different departments - you can use Triton without logging into it.)
An appropriate acknowledgement line might be one of:
We acknowledge the computational resources provided by the Aalto Science-IT project.
or
The calculations presented above were performed using computer resources within the Aalto University School of Science “Science-IT” project.
You can decide which one fits better to your text/slides. Rephrasing is also fine, the main issue is referencing to Science-IT and Aalto. (Note that this does not exist in various funding databases, this is an Aalto internal project.)
Reporting
This applies for:
Triton cluster usage (including data storage)
The Research Software Engineer service
SciComp garage support (if you think it’s significant enough).
We can’t automatically track all publications, so we need all users to verify their publications are linked to Science-IT in ACRIS (the Aalto research information system). It takes about 30 seconds if you aren’t looking at ACRIS now, or 5 when you are already there. All publications are required to be in ACRIS anyway, so this is a fast process.
You can see the already-reported publications here: https://research.aalto.fi/en/equipment/scienceit(27991559-92d9-4b3b-95ee-77147899d043)/publications.html
Instructions:
Log in to ACRIS: https://acris.aalto.fi
Find your publication: Research Output (left sidebar) -> Click on your publication
If your publication is not already there, then see your department’s ACRIS instructions, or the general help below.
Link it to Science-IT: scroll down to “Relations” -> “Facilities/Equipment” -> Search “Science-IT” and select it. (This is on the main page, not the separate “Relations” page.)
Click Save at the bottom of the window.
Repeat for all publications (and datasets, etc.)

You are done! Your publication should appear in our lists and support our continued funding.
More help:
Manually adding journal article (most are automatically transferred): Adding articles manually to ACRIS (though most articles are transferred automatically).
You can also add datasets and software to ACRIS and link it to the Science-IT infrastructure. These aren’t automatically transferred.
Should you have problems, first contact your department’s ACRIS help (academic coordinators). If a publication or academic output somehow can’t be linked, let us know and we will make sure that we include it in our own lists.
Other promotional pictures for Science-IT’s use
We collect pictures about the work done by our community, which are used for various other presentations or funding proposals. If you have some good pictures of research which can be shared publicly, please send them to us.
Please say the requested credit (author) + citation for us to use.
Please clarify license. CC-BY 4.0 is the minimum, but CC-0 is even better.
Optional: some text description about the research and/or use of resources.
Don’t worry about making things look perfect. Most things aren’t.
Send to scicomp@aalto.fi
Tutorials
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
These are designed to be read in-order by every Triton user when they get their accounts (except maybe the last ones). In order to use Triton well, in the Hands-on SciComp roadmap you should also know the Basics (A) and Linux (C) levels as a prerequisite.
Cluster ecosystem explained
About these tutorials
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Welcome to the Aalto Scientific Computing High-performance computing (HPC) tutorials. These tutorials will get you started with the Triton cluster.
Despite the HPC in the name, most of these tutorials are not about the high-performance part: instead, we get you started using and submitting jobs to the cluster. These days, many people use a cluster for simple jobs: getting more stuff done at once, not a few very big tasks. Doing the big tasks are a more specialized topic, which this will introduce you to and you will be able to use other software for that. Programming your own HPC software is out of our scope.
Not at Aalto?
Tutorials required cluster setup
This page describes the HPC/Slurm setup needed to follow along in our HPC (=cluster computing) kickstart courses. The target audience of this page is HPC system staff who would like to direct their users to follow along with this course. What is on this page is not actual “requirements” but “if you don’t match this, you will have to tell your users”. Perhaps it could be added to your quick reference.
This course is designed to be a gentle introduction to scaling up from a local workflow to running on a cluster. It is not especially focused on the high performance part but instead the basics and running existing things on a cluster. And just to make it clear: our main lesson isn’t just following our tutorials, but teaching someone how to figure things out on other clusters, too.
Our philosophy for clusters is:
Make everything possible automatic (for example, partition selection, Slurm options). A user should only need to specify what is needed - at least for tutorials.
Standardization is good: don’t break existing standard Slum things, it should be possible to learn “base Slurm” and use it across clusters (even if it’s not the optimal form)
General
These tutorials/our course will be quite easy to use for users of a cluster which have:
Slurm as the batch system
You can get a shell (likely via ssh)
git
installed without needing to load a modulePython 2 or 3 (any version) installed as
python
without needing to load a module.
Quick reference
If you run your own cluster, create a quick reference such as Triton quick reference so that others following tutorials such as ours can quickly translate to your own cluster’s specifics. (Our hope is that all the possible local configuration is on there, so that you can translate it to your site, and that is sufficient to run).
Connecting
Connection should be possible via ssh. You probably want a cheatsheet and installation help before our course.
Slurm
Slurm is the workload manager.
Partitions are automatically detected in most cases. We have a
job_submit.lua
script that detects these cases, so that for most
practical purposes --partition
never needs to be specified:
Partition is automatically detected based on run time (except for special ones such as debug, interactive, etc).
GPU partition is automatically detected based on
--gres
.
There are no other mandatory Slurm arguments such as account or cluster selection.
seff is installed.
We use this slurm_tool
wrapper, but we don’t require it (but it
might be useful for your users anyway, perhaps this is an opportunity
for joint maintenance):
https://github.com/jabl/slurm_tool
Applications
You use Lmod and it works across all nodes without further setup.
Git: Git is used to clone our examples (and should have network access).
Python: We assume Python is available (version 2 or 3 - we make our examples run on both) without loading a module. Many of our basic examples use this to run simple programs to demonstrate various cluster features without getting deeper into software.
Data storage
We expect this to be different for everyone. We expect most clusters have at least a home directory (small) and a work space (large and high-performance).
$HOME
is the home directory, small and backed up, not for big
research, mounted on all nodes.
$WRKDIR
is an environment variable that points to a per-user
scratch directory (large, not backed up, suitable for fast I/O across
all nodes)
We also strongly recommend group-based storage spaces for better data management.
These tutorials use Aalto’s cluster as an example, but they are designed to be useful to a wide audience: most clusters operate on the same principles with local configuration or practices needed. This course/these tutorials, along with a quick reference similar to ours, will be a great start to your career. (People running a cluster can check out our hint sheet to see what differences you may need to explain.)
We will point out things that may be different, but you need to consult your own reference to see how to do it:
The way you connect to the cluster, including remote access methods.
Exact names of batch partitions.
The
slurm
utility probably isn’t installed,seff
may not be there.Module names for software.
You probably don’t have our Singularity container stuff installed.
Parallel and GPU stuff is probably different.
What’s next?
Introduce yourself to the cluster resources at Aalto.
About clusters and your work
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
This is the first tutorial. The next is Connecting to Triton.
Science-IT is an Aalto infrastructure for scientific computing. Its roots was a collaboration between the Information and Computer Science department (now part of CS), Biomedical Engineering and Computational Science department (now NBE), and Applied Physics department. Now, it still serves all Aalto and is organized from the School of Science.
You are now at the first step of the Triton tutorial.
What is a cluster?
A high-performance computing cluster is basically a lot of computers not that different than your own. While the hardware is not that much more powerful than a typical “power workstation”, it’s special in that there is so much of it and you can use it together. We’ll learn more about how it’s used together later on.

The schematic of our sample cluster. We’ll go through this piece by piece.
The things labeled “CPU Node” and “GPU Node” aren’t quite accurate in real life: that picture better depicts one a whole rack of nodes. But we show it like this so that we can pretend that one row is a CPU later on to illustrate a point.
About Triton
Triton is a mid-sized heterogeneous computational Linux cluster. This means that we are not at a massive scale (though we are, after CSC, the largest publically known known cluster in Finland). We are heterogeneous, so we continually add new hardware and incrementally upgrade. We are designed for scientific computing and data analysis. We use Linux as an operating system (like most supercomputers). We are a cluster: many connected nodes with a scheduling system to divide work between them. The network and some storage is shared, CPUs, memory, and other storage is not shared.
A real Ship of Theseus
In the Ship of Theseus thought experiment, every piece of a ship is incrementally replaced. Is it the same ship or not?
Triton is a literal Ship of Theseus. Over the ~10 years it has existed, every part has been upgraded and replaced, except possibly some random cables and other small parts. Yet, it is still Triton. Most clusters are recycled after a certain lifetime and replaced with a new one.
On an international scale of universities, the power of Triton is relatively high and it has a very diverse range of uses, though CSC has much more. Using this power requires more effort than using your own computer - you will need to get/be comfortable in the shell, using shell scripting, managing software, managing data, and so on. Triton is a good system to use for learning.
Getting help
See also
Main article: Getting Triton help
First off, realize it is hard to do everything alone - with the diversity of types of computational research and researchers, it’s not even true that everyone should know everything. If you would like to focus on your science and have someone else focus on the computational part, see our Research Software Engineer service. It’s also available for expert consultations.
There are many ways to get help. Most daily questions should go to our issue tracker (direct link), which is hosted on Aalto Gitlab (login with the HAKA button). This is especially important because many people end up asking the same questions, and in order to scale everyone needs to work together.
We have daily “SciComp garage” sessions where we provide help in person. Similarly, we have chat that can be used to ask quick questions.
Also, always search this scicomp docs site and old issues in the issue tracker.
Please, don’t send us personal email, because it won’t be tracked and might go to the wrong person or someone without time right now. Personal email is also very likely to get lost. For email contact, we have a service email address, but this should only be used for account matters. If it affects others (software, usage problems, etc), use the issue tracker, otherwise we will point you there.
Quick reference
Open the Triton quick reference - you don’t need to know what is on it (that is what these tutorials cover), but having it open now and during your work will help you a lot.
What’s next?
The next tutorial is Cluster general background knowledge.
Cluster general background knowledge
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
The following topics are required background knowledge for productive use of a remote computer cluster, and not covered in the following sequence of tutorials. You should at least browse these to confirm that you know the basics here.
The “Shell crash course”. You can read it (10-20 min), watch a short version (20 min) or longer version (1 hour). The shorter options are fine.
Building your skills
See also
Main article: Training
As time goes on, computers are getting easier and easier to use. However, research is not a consumer product, and the fact is that you need more knowledge to use Triton than most people learn in academic courses.
We have created a modular training plan, which divides useful knowledge into levels. In order to use Triton well, you need to be somewhat proficient at Linux usage (C level). In order to do parallel work, you need to be good at the D-level and also somewhat proficient at the HPC level (E-level). This tutorial and user guide covers the D-level, but it is up to you to reach the C-level first.
See our training program and plan for suggested material for self-study and lessons. We offer routine training, see our Scientific Computing in Practice lecture series page for info.
You can’t learn everything you need all at once. Instead, continually learn and know when to ask for help.
What’s next?
The next tutorial is about connecting to the cluster.
Connecting to Triton
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
The traditional way of interacting with a cluster is via the command
line in a shell in a terminal, and Secure Shell (ssh
) is
the most common way of doing that. To learn more command line basics,
see our shell crash course.
Abstract
When connecting to a cluster, our goal is to get a command-line terminal that provides a base for the rest of our work.
The standard way of connecting is via ssh, but Open OnDemand and Jupyter provide graphical environments that are useful for interactive work.
SSH host name is
triton.aalto.fi
, use VPN if not on an Aalto network.
Method |
Description |
From where? |
---|---|---|
ssh from Aalto networks |
Standard way of connecting via command line. Hostname is
>Linux/Mac/Win from command line: >Windows: same, see Connecting via ssh for details options. |
VPN and Aalto networks (which is VPN, most wired,
internal servers, |
ssh (from rest of Internet) |
Use Aalto VPN and row above. If needed: same as above, but must set up SSH key and then |
Whole Internet, if you first set up SSH key AND also use passwords (since 2023) |
VDI |
“Virtual desktop interface”, https://vdi.aalto.fi, from there you can |
Whole Internet |
Jupyter |
Since April 2024 Jupyter is part of Open OnDemand, see below. More info. |
See the corresponding OOD section |
Open OnDemand |
https://ondemand.triton.aalto.fi, Web-based interface to the cluster. Also known as OOD. Includes shell access, GUI, data transfer, Jupyter and a number of GUI applications like Matlab etc. More info. |
VPN and Aalto networks or through VDI |
Kickstart course preparation
Are you here for a SciComp KickStart course? You just need to make sure you have an account and then be able to get to a terminal (as seen in the picture below) by any of the means here, and you don’t need to worry about anything else. Everything else, we do tomorrow.
Local differences
The way you connect will be different in every site, but you should be able to get a terminal somehow.

We are working to get access to the login node. This is the gateway to all the rest of the cluster.
Getting an account
Triton uses Aalto accounts, but your account must be activated first.
The terminal
This is what you want by the end of this page: the command line
terminal. Take the first option that works, or the one that’s
comfortable to you. However, it’s good to get ssh
working
someday, since it is very useful. Later, in Using the cluster from a shell,
will explain more about how to actually use it.

Image of a terminal - this is what you want after this page. You’ll see more about this means in Using the cluster from a shell. Don’t worry about what the commands mean, but you can probably figure out.
Connecting via ssh
ssh
is one of the most fundamental programs of remote connections: by using it well, you
can really control almost anything from from anywhere. It is not only
used for connecting to the cluster, but also for data transfer. It’s
worth making yourself comfortable with its use.
All Linux distributions come with an ssh
client, so you don’t need to do
anything. To use graphical applications, use the standard -X
option,
nothing extra needed.:
$ ssh triton.aalto.fi
## OR, if your username is different:
$ ssh USERNAME@triton.aalto.fi
If you are not in the Aalto networks, use the Aalto VPN.
ssh
is installed by default, usage is the same as in the
Linux tab after starting the Terminal application. To run
graphical applications, you need to install an X server
(XQuartz).
Install the Windows Subsystem for Linux and then use the Linux instructions. This will give you a top-level interface to scientific work on your computer and is highly recommended.
This may not work if you do not have proper admin rights on your computer (e.g. if it is university managed). Ask your IT support well in advance for help!
If you can’t use WSL, you can also use PowerShell. Start the “Windows PowerShell” program. Then, follow the Linux instructions. If you want to set up ssh keys there are a few differences but overall it is the same procedure.
If you can’t use WSL, then you can install a separate terminal application.
PuTTY is the standard SSH client. If you want to run graphical programs, you need an X server on Windows: see this link for some hints. (Side note: putty dot org is an advertisement site trying to get you to install something else.)
You should configure PuTTY with the hostname, username, and save the settings so that you can connect quickly.
If you are not on an Aalto network, there are extra steps. We
recommend you use the Aalto VPN
rather than any other workarounds. (Aalto networks are VPN, Eduroam,
wired workstations, internal servers, and aalto
network only if
using an Aalto-managed computer.)
When connecting, you can verify the ssh key fingerprints which will ensure security.
See the advanced ssh information to learn how to log in without a password, automatically save your username and more. It really will save you time.
SSH from outside the Aalto networks
If you are from outside the Aalto networks, use the ProxyJump
option (-J
) in modern OpenSSH to connect directly to Triton
without VPN. This is more work than VPN, since you have to set up
SSH keys AND use a password anyway:
$ ssh -J kosh.aalto.fi triton.aalto.fi
## OR, if your username is different:
$ ssh -J USERNAME@kosh.aalto.fi USERNAME@triton.aalto.fi
## If you do not have the -J option:
$ ssh kosh.aalto.fi
$ ssh triton.aalto.fi
SSH configuration file
This is described under the advanced ssh information, but here is a quick summary:
If you use OpenSSH (Linux/MacOS/WSL or Windows Powershell instructions above), the
.ssh/config
file (on windows the .ssh
folder is commonly under C:\Users\YourUsername
)
is valuable to set up to make connecting more seamless, with this you can run
ssh triton_via_kosh
instead of using the -J
option - and this same
triton_via_kosh
will work with what you learn on the Remote access to data page!:
Host triton
User USERNAME
Hostname triton.aalto.fi
Host triton_via_kosh
User USERNAME
Hostname triton
ProxyJump USERNAME@kosh.aalto.fi
Aalto: Change your shell to bash
Only needed if you shell isn’t already bash
. If echo $SHELL
reports /bin/bash
, then you are already using bash.
The thing you are interacting with when you type is the shell -
the layer around the operating system. bash
is the most common
shell, but the Aalto default shell used to be zsh
(which is more
powerful in some ways, but harder to teach with). Depending on
when you joined Aalto, your default might already be bash
.
We recommend that you check and change your shell to bash.
You can determine if your shell is bash by running echo $SHELL
.
Does it say /bin/bash
?
If not, ssh
to kosh.aalto.fi
and run chsh -s /bin/bash
.
It may take 15 minutes to update, and you will need to log in again.
Connecting via Open onDemand
See also
OOD (Open onDemand) is a web-based user interface to Triton, including shell access, and data transfer, and a number of other applications that utilize graphical user interfaces. Read more from its guide. The Triton shell access app will get you the terminal that you need for basic work and the rest of these tutorials.
It is only available from Aalto networks and VPN. Go to https://ood.triton.aalto.fi and login with your Aalto account.
Connecting via JupyterHub
See also
Jupyter is a web-based way of doing computing. But what some people forget is that it has a full-featured terminal and console included.
Go to https://jupyter.triton.aalto.fi (not .cs.aalto.fi) and log in. Select “Slurm 5 day, 2G” and start.
To start a terminal, click File→New→Terminal - this is the shell you need. If you need to edit text files, you can also do that through JupyterLab (note: change to the right directory before creating a new file!).
Warning: the JupyterHub shell runs on a compute node, not a login
node. Some software is missing so some things don’t work. Try ssh
triton.aalto.fi
from the Jupyter shell to connect to the login node.
To learn more about Jupyterlab, you need to read up elsewhere, there
are plenty of tutorials.
Connecting via the Virtual Desktop Interface
See also
If you go to https://vdi.aalto.fi, you can access a cloud-based Aalto Linux
workstation. HTML access works from everywhere, or download the
“VMWare Horizon Client” for a better connection. Start a Ubuntu
desktop (you get Aalto Ubuntu). From there, you have to use the
normal Linux ssh instructions to connect to Triton (via the Terminal
application) using the instructions you see above: ssh
triton.aalto.fi
.
VSCode
You can use a web-based VSCode through Open OnDemand. Desktop VSCode can also connect to Triton via SSH. Read more
Exercises
If you are in the kickstart course, Connecting-1 is required for the rest of the course.
Connecting-1: Connect to Triton
Connect to Triton, and get a terminal by one of the options above.
Type the command hostname
to verify that you are on Triton.
Run whoami
to verify your username.
Solution
$ hostname
login3.triton.aalto.fi
$ whoami
darstr1
Connecting-2: (optional) Test a few command line programs
Check the uptime and load of the login node: uptime
and
htop
(q
to quit - if htop
is not available, then
top
works almost as well). What else can you learn about the
node? (You’ll learn more about these in Using the cluster from a shell, this
is just a preview to fill some time.)
Solution
You should see something like this. From this example output we can tell that the node was last rebooted 18 days ago, and the load average seems pretty high (1 = “about one processor in use”. There are 24 processors in 2023. Load of 1-5 would be normal). Someone is running things directly on the login node, which is not good:
$ uptime
17:32:25 up 18 days, 3:20, 128 users, load average: 29.46, 32.78, 34.28
More info:
$ lscpu
(long output not listed here)
$ uname -a # tells a bit about operating system info
Linux login3.triton.aalto.fi 3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
We’ll see more in Using the cluster from a shell.
Connecting-3: (optional, Aalto only) Check your default shell
Check what your default shell is: run echo $SHELL
. If it doesn’t
say /bin/bash
, go ahead and change your shell to bash if it’s
not yet (see the expandable box above).
This $SHELL
syntax is an environment variable and a pattern
you will see in the future.
Solution
$ echo $SHELL
/bin/bash
(advanced but recommended) Connecting-4: SSH configuration
If you use Linux/MacOS/WSL, start setting up a .ssh/config
file
as shown above and in SSH. You probably won’t have
time to finish this, but you can resume later. Customize it to
suit your case.
The “solution” is listed in the linked documents.
See also
What’s next?
The next tutorial is about using the terminal.
Using the cluster from a shell
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
A shell is the command-line terminal interface which is the most common method of accessing remote computers. If you are using a cluster, you aren’t just computing things. You are programming the computer to do things for you over and over again. The shell is the only option to make this work, so you have to learn a little bit.

We are still only on the login node. If you stop here, you aren’t actually using the cluster - just the entry point. If you run too much code here, you’ll get a polite message asking to use the rest of the cluster.
Terminology
A terminal is the physical or software device that sends lines and shows the output.
A shell is the interface that reads in lines, does something with the operating system, and sends it back out.
The command line interface refers to the general concept of these lines in, lines out.
All these terms are usually used somewhat interchangably.
Why command line interfaces?
The shell is the most powerful interface to computers: you can script other programs to do things automatically. It’s much easier to script things with text, than by clicking buttons. It’s also very easy to add the command line interfaces to programs to make them scriptable. Shells, such as bash or zsh, are basically programming languages designed to connect programs together.

Image of a terminal - this is what does it all.
In the image above, we see a pretty typical example. The prompt
is darstr1@login3:~$
and gives a bit of info about what computer
you are running on. The commands whoami
tells who you are
(darstr1
) and hostname
tells what computer you are on
(login3.triton.aalto.fi
).
You can also give options and arguments to programs, like this:
$ python pi.py --seed=50
The parts are like this:
python
is the program that is run.pi.py
and--seed=50
are arguments. It tells the program what to do, and the program can interpert them however it wants. For Python,your-program.py
is the Python file and that Python file itself knows how to handle--seed=50
.
These arguments let you control the program without modifying the
source code (or clicking buttons with your mouse!). This lets us, for
example, make a shell script that runs with many different --seed
values automatically (this is a hint about our future!).
You will learn all sorts of commands as you progress in your career. The command line quick reference gives the most important ones.
Files and directories
On your phone and other “app”-like things, data just exists - you don’t really know where. Now, you are the programmer doing scientific computing, so you have to make more meaningful decisions about data arrangement. This means knowing about files (a chunk of data) and directories (hierarchical storage units, also known as folders). On a cluster, you can’t throw everything into the same place. You need to sort stuff and keep it organized. File names are an essential part of automating things. Thus, you need knowledge of the storage hierarchy.
Everything on a Unix (Linux) system is organized in a hierarchy. There aren’t “drives” like “C-drive”, different storage systems can be available anywhere:
/
is the root of the filesystem/home/
is a directory (“home directories”)/home/darstr1/
is the home directory of the userdarstr1
/home/darstr1/git/
is the directory darstr1 uses to store general git repositories.… etc
$HOME
is an environment variable shortcut for your home directory.~
is another shortcut for you home directory.On Triton,
/scratch/
is the basic place for storing research data. Also on Triton,$WRKDIR
is a shortcut for your personal space in scratch (this is an environment variable).
On a graphical computer, you open a window to view files, but this is disconnected from how you run programs. In a shell, they are intrinsically connected and that is good.
The most common commands related to directories:
pwd
shows the directory you are in.cd NAME
changes to a directory. All future commands are relative to the directory you change to. This is the (current) working directoryls [NAME]
lists the contents of a directory.[NAME]
is an optional directory name - by default, it lists the working directory.mkdir NAME
makes a new directoryrm -r NAME
removes a directory (or file) recusrsively - that and everything in it! There is no backup, be careful.
Exercises, directories
You have to be connected to the custer and have a terminal to do these exercises.
Shell-1: Explore directories
If you are not at Aalto, try to do similar things but adjusted to your cluster’s data storage.
Print your current directory with
pwd
List the contents with
ls
List the contents of
/scratch/
, then the contents of another directory within it, and so on.List your work directory
$WRKDIR
.Change to your work directory. List it again, with a plain
ls
(no full path needed).List your home directory from your work directory (you need to give it a path)
Log out and in again. List your current directory. Note how it returns to your home directory - each time you log in, you need to navigate to where you need to be.
Solution
$ pwd
/home/darstr1
$ ls
## (lots of stuff here. Or maybe nothing, if your account is
## brand new)
$ ls /scratch/
admin/ cs/ nbe/ rse/ shareddata/
apps/ elec/ other/ scicomp/ work/
courses/ eng/ math/ phys/ scip/
## (will vary for you)
$ ls $WRKDIR
## (output will vary for you. Or might be empty if nothing is
## there yet)
$ cd $WRKDIR
$ ls
## (same output as before)
$ ls $HOME
$ ls ~
## (commands give same output. Maybe empty if nothing is there
## yet)
To log out:
$ exit
Logging in again:
you@laptop$ ssh USERNAME@triton.aalto.fi
$ pwd
/home/darstr1
$ cd $WRKDIR
Shell-2: Understand power of working directory
ls /scratch/cs/
Change directory to
/scratch
Now list
/scratch/cs
, but don’t re-type/scratch
.
Solution
$ ls /scratch/cs/ $ cd /scratch $ ls cs/After changing your current directory, you should see the same output as from the first command with just
ls cs
. Like vast majority of commands,ls
uses your relative path to the target. Since you are already in/scratch/
you don’t need to type it again.You’ll be using this concepts in your projects all the time.
Copy your code to the cluster
Usually, you would start by copying some existing code and data into the cluster (you can also develop the code straight on the cluster). Let’s talk about the code first. You would ideally have code in a git repository - this version control system (VCS) can tracks files, synchronizes versions, and most importantly lets you copy them to the cluster easily.
You’d make a git repository on your own computer where you work. You would sync this with some online service (such as Github (github.com) or Aalto Gitlab (version.aalto.fi)), and then copy it to the cluster. Changes can go the other way. (You can also go straight from computer→cluster, but that’s beyond the scope of now). Git is outside the scope of this tutorial, but you should see CodeRefinery’s git-intro course, and really all of CodeRefinery’s courses. This isn’t covered any further here.
We are going to pretend we are researchers working on a sample project, named hpc-examples. We’ll pretend this is our research code and keep using this example repository for the rest of the tutorials. You can look at all the files in the repository here: https://github.com/AaltoSciComp/hpc-examples/ .
Let’s clone the HPC-examples repository so that we can work on it. First, we make sure we are in our home directory (we always want to make sure we know where we are! The home directory is the default place, though):
$ cd $HOME
Then we clone our git repository:
$ git clone https://github.com/AaltoSciComp/hpc-examples/
We can change into the directory:
$ cd hpc-examples
Now we have our code in a place that can be used.
Warning
Storing your analysis codes in your home directory usually isn’t recommended, since it’s not large or high performance enough. You will learn more about where to store your work in Data storage.
Shell-3: clone the hpc-examples repository
Do the steps above. List the directory and verify it matches what you see in the Github web interface.
Is your home directory the right place to store this?
Solution
The steps are listed above. You also can check that everything is correct with
git status
. Output should be something like this:$ ls io/ mpi/ postgres/ R/ scip/ gpu/ misc/ openmp/ python/ README.rst slurm/ $ git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree cleanNormally, large projects you are working on should be in your work directory. This is small enough we can ignore that for now (and make our exercises work on different clusters).
Shell-4: log out and re-navigate to the hpc-examples reports
Log out and log in again. Navigate to the hpc-examples repository. Resuming work is an important but often forgotten part of work.
Solution
$ exit
you@laptop$ ssh USERNAME@triton.aalto.fi
$ cd hpc-examples
$ ls
## (same output as previous exercise)
Running a basic program
But how would you actually run things? Usually, you would:
Decide where to store your code
Copy your code to the cluster (like we did above with the hpc-examples repository)
Each time you connect, change directory to the place with the code and run from there.
In our case, after changing to the hpc-examples directory, let’s run
the program pi.py
using Python (this will be our common example
for a while):
$ cd hpc-examples
$ python3 slurm/pi.py 10000
The argument “10000” is the number of iterations of the circle in square method of calculating π.
Danger
This is running your program on the login node! Since this takes only a second, it’s OK enough for now (so that we only have to teach one thing at a time). You will learn how to run programs properly starting in Slurm: the queuing system.
Shell-5: try calculating pi
Try doing what is above and running pi.py
several times with
different numbers of iterations. Try passing the --seed
command
line option with the values 13
, and 19759
.
From this point on, you need to manage your working directory. You need to be in the hpc-examples directory when appropriate, or somehow give a proper path to the program to be run.
Solution
All these are equivalent ways to run the program:
$ python3 hpc-examples/slurm/pi.py 10000
$ cd hpc-examples
$ python3 slurm/pi.py 10000
$ cd hpc-examples/slurm
$ python3 pi.py 10000
Running with different numbers of iterations:
$ cd hpc-examples
$ python3 slurm/pi.py 10000
Calculating Pi via 10000 stochastic trials
{"successes": 7815, "pi_estimate": 3.126, "iterations": 10000}
$ python slurm/pi.py 100
Calculating Pi via 100 stochastic trials
{"successes": 78, "pi_estimate": 3.12, "iterations": 100}
$ python slurm/pi.py 1000000
Calculating Pi via 1000000 stochastic trials
{"successes": 785148, "pi_estimate": 3.140592, "iterations": 1000000}
Running with different values of the seed:
$ python slurm/pi.py 10000 --seed=13
Calculating Pi via 10000 stochastic trials
{"successes": 7816, "pi_estimate": 3.1264, "iterations": 10000}
$ python slurm/pi.py 10000 --seed=19759
Calculating Pi via 10000 stochastic trials
{"successes": 7817, "pi_estimate": 3.1268, "iterations": 10000}
Shell-6: Try the --help
option
Many programs have a --help
option which gives a reminder of the
options of the program. (Note that this has to be explicitly
programmed - it’s a convention, not magic.) Try giving this option
to pi.py
and see what happens.
Solution
pi.py
does have a --help
option. Libraries that handle
command line arguments for you can auto-generate this help, which
is useful even if you wrote the program yourself. In this case,
the help output is automatically generated by the Python standard
library module argparse.
$ python slurm/pi.py --help
usage: pi.py [-h] [--nprocs NPROCS] [--seed SEED] [--sleep SLEEP]
[--optimized] [--serial SERIAL]
iters
positional arguments:
iters Number of iterations
optional arguments:
-h, --help show this help message and exit
--nprocs NPROCS Number of nprocs, using multiprocessing
--seed SEED Random seed
--sleep SLEEP Sleep this many seconds
--optimized Run an optimized vectorized version of the code
--serial SERIAL This fraction [0.0--1.0] of iterations to be run serial.
Copying and manipulating files
More info: Linux shell crash course
cp OLD NEW
make a copy of OLD in NEWmv OLD NEW
renames a file OLD to NEWrm NAME
removes a file (with no warning or backup)
A file consists of its contents and metadata. The metadata is information
like user, group, timestamps, permissions. To view metadata, use ls
-l
or stat
.
Shell-7: (optional) Make a copy of pi.py
Make a copy of the pi.py program we have been using. Call it
pi-new.py
Solution
$ cd hpc-examples
$ cp slurm/pi.py slurm/pi-new.py
$ ls slurm/
... pi.py pi-new.py ...
Note that we can copy a file without being in its directory if we use a relative path.
Editing and viewing files
You will often need to edit files (in other words, change their contents). You could do this on your computer and copy them over every time, but that’s really slow. You can, and should, do basic edits directly on the cluster itself.
nano
is an editor which allows you to edit files directly from the shell. This is a simple console editor which always gets the job done. Use Control-x (control and x at the same time), theny
when requested and enter, to save and exit.less
is a pager (file viewer) which lets you view files without editing them. (q
to quit,/
to search,n
/N
to research forward and backwards,<
for beginning of file,>
for end of file)cat
dumps the contents of a file straight to the screen - sometimes useful when looking at small things.
Shell-9: Create a new file and show its contents
Create a new file poem.txt
. Write some poem in it. View the
contents of the file.
Solution
First let’s go back to our home directory, this doesn’t seem to be
an hpc-example. cd
with no arguments goes to home dir:
$ cd
$ pwd
/home/darstr1
Edit the file with nano. When done, “Control-x” “y” to exit:
$ nano poem.txt
To display the contents of the file, we can cat
it or use
less
(q
to quit less):
$ cat poem.txt
When do we need the
high performance computing
cluster for our work?
Shell-10: (optional, advanced) Edit py-new.py
Remember the pi-new.py file you made? Add some nonsense edits to it and try to run it. See if it fails.
Solution
Remember we changed directories, so go back to place we cloned the repository, wherever it is (could this be the main point of the exercise?):
$ cd hpc-examples
Confirm the file is there and edit the file. Notice we don’t have to go to its exact directory, a relative directory is OK:
$ ls slurm/
... pi-new.py ...
$ nano slurm/pi-new.txt
Try to run it:
$ python3 slurm/pi-new.py
File "slurm/pi-new.py", line 10
mxhbuhetihiugug euhuethuoegceuothoeu
^
SyntaxError: invalid syntax
Exercises
Shell-11: (advanced, to fill time) shell crash course
Browse the Linux shell crash course and see what you do and don’t know from there.
Solution
Did you think there was a solution here?
See also
This is only a short intro.
Linux shell crash course: You really need to read this for more info. You can also watch a short version (20 min) or longer version (1 hour). The shorter options are fine.
git-intro course, and really all of CodeRefinery’s courses
What’s next?
The next step is looking at the applications available on the cluster.
Applications
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
In this tutorial, we talk about the overall process of finding, building, and compiling software. These days, installing and managing scientific software is taking more and more time, thus we need to specifically talk about it.
Clusters, being shared systems, have more complicated software requirements. In this tutorial, you will learn how to use existing software. Be aware that installing your own is possible (and people do it all the time), but does require some attention to details. Either way, you will need to know the basics of software on Linux.
Abstract
There are many ways to get software on Triton:
Usually, you can’t install software the normal operating system way.
The cluster admins install many things for you, and they are loadable with Modules.
Sometimes, you need to install some stuff on top of that (your own libraries or environments)
You can actually install your own applications, but you need to modify instructions to work in your own directories.
Singularity containers allow you to run other hard-to-install software.
Ask us to install software if you need it. Ask for help if you try it yourself and it seems to be taking too long.
See also
Main article: Applications: General info
Local differences
Almost every site will use modules. The exact module names, and anything beyond that, will be different. Containers are becoming more common, but they are less standardized.
There are four main ways of getting software:
It’s installed through the operating system some relatively “normal” way.
Someone (a cluster admin) has already installed it for you. Or you ask someone to install what you need.
Someone has installed the base of what you need. You do some extra.
Install it yourself in a storage place you have access to. (Maybe you share it with others?)
Installed through operating system
People sometimes expect the cluster to work just like your laptop: install something through a package manager or app store. This doesn’t really work when you have hundreds of users on the same system: if you upgrade X, how many other people’s work suddenly breaks?
Thus, this isn’t really done, except for very basic, standalone applications. If it is done, this stuff isn’t upgraded and often old: instead, we install through modules (the next point) so that people can choose the version they want.
One unfortunate side-effect is that almost all software installation instructions you find online don’t work on the cluster. Often times, it can be installed, but people don’t think to mention it in the documentation. This often requires some thought to figure out: if you can’t figure it out, ask for help!
Cluster admin has installed it for you
The good thing about the cluster is that a few people can install
software and make it usable by a lot of people. This can save you a
lot of time. Your friendly admins can install things through the
Software modules (an upcoming lesson), so that you can module load
it with very little work. You can even choose your exact version, so
that it keeps working the same even if someone else needs a newer
version.
Some clusters are very active in this, some expect the users to do more. Some things are so obscure, or so dependent on local needs, that it only makes sense to help people install it themselves. To look for what is available:
Check out the page Applications: General info and its sub-pages.
Search the issue tracker to see if someone has already requested it and documented there.
Ask us: Our issue tracker, Zulip chat or our support garage.
Modules (Software modules - next lesson) are the usual way of making this available. The command
module spider NAME
will search for anything of that name.
If you need something installed, contact us. The issue tracker is usually the best way to do this.
Some of the most common stuff that is available:
Python:
module load scicomp-python-env
for the an Aalto Scientific Computing managed Python environment with various packages. More info.R:
module load r
for a basic R package. More info.Matlab:
module load matlab
for the latest Matlab version. More info.
Important
This is Aalto-specific. Some of these will work if you module
load fgci-common
at other Finnish sites (but not CSC). This is
introduced in the next lesson.
Already installed, you add extra modules you need
Even if a cluster admin installs some software, often you might need to improve it some. One classic example is Python: we provide Python installations, but you need your own modules there. So, you can use our base Python installation to create your own environments - self-contained systems where you can install whatever you need. Different languages have different ways of doing this:
Python: Conda environments, virtual environments. See Python Environments with Conda.
Environments have an advantage that you can do multiple projects at once, and move between computers more easily.
Install it yourself
Sometimes, you need to install software yourself - which you can do if you can tell it to install just into your home directory. Usually, the software’s instructions don’t talk about how to do this (and might not even mention things like the environments in the previous point).
One common way of doing this is containers (for example, Docker or Apptainer/Singularity). These basically allow you to put an entire operating system in one file, so that your software works everywhere. Very nice when software is difficult to install or needs to be moved from computer to computer, but can take some work to set up. See Singularity Containers for the information we have so far.
We can’t go into this more right now - ask us for help if needed. If you make a “we need X installed” request, we’ll tell you how to do it if self-installation is the easiest way.
What you should do
Check if you can find what you need already: issue tracker and searching this site.
Ask for help if you can’t find it.
Once you get there, make your software nice and reusable, so that others won’t have the same problems you did: make it easy to install and reusable. Contact the Research Software Engineers for help!
Exercises
These are more for thinking than anything.
Applications-1: Check your needs
Find the Applications page link above, the issue tracker, etc., and if we already have your software installed. See if we have what you need, using any of the strategies on that list.
(optional) Applications-2: Your group’s needs
Discuss among your group what software you need, if it’s available, and how you might get it. Can they tell you how to get started?
What’s next?
The next tutorial covers software modules in more detail.
Software modules
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
There are hundreds of people using every cluster. They all have different software needs, including conflicting versions required for different projects! How do we handle this without making a mess, or one person breaking the cluster for everyone?
This is actually a very hard, but solved within certain parameters, problem. Software installation and management takes up a huge amount of our time, but we try to make it easy for our users. Still, it can end up taking a lot of your effort as well.
Abstract
We use the standard Lmod module system, which makes more software available by adjusting environment variables like
PATH
.module spider NAME
searches for NAME.module load NAME
loads the module of that name. Sometimes, it can’t until youmodule load
something else (read themodule spider
message carefully).See the Triton quick reference for a
module
command cheatsheet.
Local differences
Almost every site uses modules, and most use the same Lmod system we use here. But, the exact module names you can load will be different.
Introduction to modules
The answer is the standard “module” system Lmod. It allows us to have unlimited number of different software packages installed, and the user can select what they want. Modules include everything from compilers (+their required development files), libraries, and programs. If you need a program installed, we will put it in the module system.
In a system the size of Triton, it just isn’t possible to install all software by default for every user.
A module lets you adjust what software is available, and makes it easy to switch between different versions.
As an example, let’s inspect the anaconda
module with module
show anaconda
:
$ module show anaconda
----------------------------------------------------------------------------
/share/apps/anaconda-ci/fgci-centos7-anaconda/modules/anaconda/2023-01.lua:
----------------------------------------------------------------------------
whatis("Name : anaconda")
whatis("Version : 2023-01")
help([[This is an automatically created Anaconda installation.]])
prepend_path("PATH","/share/apps/anaconda-ci/fgci-centos7-anaconda/software/anaconda/2023-01/2eea7963/bin")
setenv("CONDA_PREFIX","/share/apps/anaconda-ci/fgci-centos7-anaconda/software/anaconda/2023-01/2eea7963")
setenv("GUROBI_HOME","/share/apps/anaconda-ci/fgci-centos7-anaconda/software/anaconda/2023-01/2eea7963")
setenv("GRB_LICENSE_FILE","/share/apps/manual_installations/gurobi/license/gurobi.lic")
The command shows some meta-info (name of the module, its version, etc.)
When you load this module, it adjusts various environment paths (as
you see there),
so that when you type python
it runs the program from
/share/apps/anaconda-ci/fgci-centos7-anaconda/software/anaconda/2023-01/2eea7963/bin
.
This is almost magic: we can have many versions of any software installed,
and everyone can pick what they want, with no conflicts.
Loading modules
Let’s dive right into an example and load a module.
Local differences
If you are not at Aalto, you need to figure out what modules exist for you. The basic princples probably work on almost any cluster.
Let’s assume you’ve written a Python script that is only compatible
with Python version 3.7.0 or higher. You open a shell to find out
where and what version our Python is. The type
program looks up
the current detected version of a program - very useful when testing
modules (if this doesn’t work, use which
).:
$ type python3
python3 is /usr/bin/python3
$ python3 -V
Python 3.6.8
But you need a newer version of Python. To this end, you can load
the anaconda
module using the module load anaconda
command,
that has a more up to date Python with lots of libraries already
included:
$ module load anaconda
$ type python
python3 is /share/apps/anaconda-ci/fgci-centos7-anaconda/software/anaconda/2023-01/2eea7963/bin/python3
$ python -V
Python 3.10.8
As you see, you now have a newer version of Python, in a different directory.
You can see a list of the all the loaded modules in our working shell
using the module list
command:
$ module list
Currently Loaded Modules:
1) anaconda/2023-01
Note
The module load
and module list
commands can be abbreviated as ml
Let’s use the module purge
command to unload all the loaded
modules:
$ module purge
Or explicitly unload the anaconda
module by using the module
unload anaconda
command:
$ module unload anaconda
You can load any number of modules in your open shell, your scripts,
etc. You could load modules in your ~/.bash_profile
, but then it
will always automatically load it - this causes unexplainable bugs
regularly!
Module versions
What’s the difference between module load anaconda
and module load
anaconda/2023-01
?
The first anaconda
loads the version that Lmod assumes to
be the latest one - which might change someday! Suddenly, things don’t
work anymore and you have to fix them.
The second loading anaconda/2023-01
loads that exact version,
which won’t change. Once you want stability (possibly from day one!), it’s
usually a good idea to load specific version, so that your environment
will stay the same until you are done.
Hierarchical modules
Hierarchical modules means that you have to load one module before you can load another. This is usually a compiler:
For example, let’s load a newer version of R:
$ module load r/4.2.2
Lmod has detected the following error: These module(s) or
extension(s) exist but cannot be loaded as requested: "r/4.2.2"
Try: "module spider r/4.2.2" to see how to load the module(s).
Lmod says that the modules exist but can’t be loaded, but gives a hint for what to do next. Let’s do that:
$ module spider r/4.2.2
----------------------------------------------------------------------------
r: r/4.2.2
----------------------------------------------------------------------------
You will need to load all module(s) on any one of the lines below before the "r/4.2.2" module is available to load.
gcc/11.3.0
Help:
...
So now we can load it (we can do it in one line):
$ module load gcc/11.3.0 r/4.2.2
$ R --version
R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
What’s going on under the hood here?
In Linux systems, different environment variables like $PATH
and
$LD_LIBRARY_PATH
help figure out how to run programs. Modules
just cleverly manipulate these so that you can find the software you
need, even if there are multiple versions available. You can see
these variables with the echo command, e.g. echo $PATH
.
When you load a module in a shell, the module command changes the current shell’s environment variables, and the environment variables are passed on to all the child processes.
You can explore more with module show NAME
.
Making a module collection
There is a basic dependency/conflict system to handle module
dependency. Each time you load a module, it resolves all the
dependencies. This can result in long loading times or be annoying to
do each time you log in to the system. However, there is a solution:
module save COLLECTION_NAME
and module restore COLLECTION_NAME
Let’s see how to do this in an example.
Let’s say that for compiling / running your program you need:
a compiler
CMake
MPI libraries
FFTW libraries
BLAS libraries
You could run this each time you want to compile/run your code:
$ module load gcc/9.2.0 cmake/3.15.3 openmpi/3.1.4 fftw/3.3.8-openmpi openblas/0.3.7
$ module list # 15 modules
Let’s say this environment works for you. Now you can save it with
module save MY-ENV-NAME
. Then module purge
to unload
everything. Now, do module restore MY-ENV-NAME
:
$ module save my-env
$ module purge
$ module restore my-env
$ module list # same 15 modules
Generally, it is a good idea to save your modules as a collection to have your desired modules all set up each time you want to re-compile/re-build.
So the subsequent times that you want to compile/build, you simply
module restore my-env
and this way you can be sure you have the
same previous environment.
Note
You may occasionally need to rebuild your collections in case we re-organize things (it will prompt you to rebuild your collection and you simply save it again).
Full reference
Command |
Description |
---|---|
|
load module |
|
list all modules |
|
search modules |
|
show prerequisite modules to this one |
|
list currently loaded modules |
|
details on a module |
|
details on a module |
|
unload a module |
|
save module collection to this alias (saved in |
|
list all saved collections |
|
details on a collection |
|
load saved module collection (faster than loading individually) |
|
unload all loaded modules (faster than unloading individually) |
Final notes
If you have loaded modules when you build/install software, remember
to load the same modules when you run the software (also in Slurm
jobs). You’ll learn about running jobs later, but the module load
should usually be put into the job script.
The modules used to compile and run a program become part of its environment and must always be loaded.
We use the Lmod system and Lmod works by changing environment variables. Thus, they must be sourced by a shell and are only transferred to child processes. Anything that clears the environment clears the loaded modules too. Module loading is done by special functions (not scripts) that are shell-specific and set environment variables.
Triton modules are also available on Aalto Linux: use module load
triton-modules
to make them available.
Some modules are provided by Aalto Science-IT, and on some clusters they could be provided by others, too. You could even make your own user modules.
Exercises
Before each exercise, run module purge
to clear all modules.
If you aren’t at Aalto, many of these things won’t work - you’ll have to check your local documentation for what the equivalents are.
Modules-1: Basics
module avail
and check what you see. Find a software that has
many different versions available.
Load the oldest version.
Solution
Let’s use anaconda as an example. To see all available versions of anaconda,
we can either use module avail anaconda
or the better option
module spider anaconda
. Oldest version of anaconda is anaconda/2020-01-tf1
.
We can load it using module load anaconda/2020-01-tf1
Modules-2: Modules and PATH
PATH
is an environment variable that shows from where programs
are run. See it’s current value using echo $PATH
.
type
is a command line tool (a shell builtin, so your shell may
not support it, but bash
and zsh
do) which tells you the
full path of what will be run for a given command name - basically
it looks up the command in PATH
Run
echo $PATH
andtype python
.module load anaconda
Re-run
echo $PATH
andtype python
. How does it change?
Solution
echo $PATH
should print something like this:
.../usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin...
Your PATH is most likely longer and doesn’t have to look exactly like this.
type python
should output something like this:
python is /usr/bin/python
After module load anaconda
, type python
should print something like
/share/apps/anaconda-ci/fgci-centos7-anaconda/software/anaconda/2023-01/2eea7963/bin/python
and you should see the same path added to your PATH.
Modules-3: Complex module and PATH
Check the value of $PATH
. Then, load the module py-gpaw
.
List what it loaded. Check the value of PATH
again. Why is
there so much stuff? Can you find a module command that explains
it?
Solution
Running module list
shows you that over 50 modules have been loaded.
All of these are dependencies of py-gpaw
, and as such were loaded alongside it.
You can see dependencies of a module using module show NAME
. In the case of module show py-gpaw
you can see that py-gpaw
loads several other modules when it is loaded. Some of these models also load
their own depedencies.
Hierarchical modules
How can you load the module quantum-espresso/7.1
:
$ ml load quantum-espresso/7.1
Lmod has detected the following error: These module(s) or
extension(s) exist but cannot be loaded as requested: "quantum-espresso/7.1"
Try: "module spider quantum-espresso/7.1" to see how to load the module(s).
Solution
This is a double-hierarchical module, that is built using two different toolchains, so you have a choice to make when loading:
$ module spider quantum-espresso/7.1
...
You will need to load all module(s) on any one of the lines
below before the "openfoam-org/11" module is available to load.
gcc/11.3.0 openmpi/4.1.5
intel-oneapi-compilers/2023.1.0 openmpi/4.1.5
So here we go, loaded and we use which
to verify one of the
programs can be found:
$ module load gcc/11.3.0 openmpi/4.1.5 quantum-espresso/7.1
$ $ which pw.x
/share/apps/scibuilder-spack/aalto-centos7/2023-01/software/linux-centos7-haswell/gcc-11.3.0/quantum-espresso-7.1-sxtbtq2/bin/pw.x
Modules-5: Modules and dependencies
Load a module with many dependencies, such as r-ggplot2
and
save it as a collection. Purge your modules, and restore the
collection.
Solution
Save the collection with module save my_env
. After module purge
you can load your collection again with module restore my_env
. Making a
collection can be particularily useful if you have a job that depends on a large
number of separate modules, in which case a collection saves you the trouble of
loading them one by one.
See also
Lmod documentation https://lmod.readthedocs.io/en/latest/
The “User documentation” https://lmod.readthedocs.io/en/latest/010_user.html
What’s next?
The next tutorial covers data storage.
Data storage
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
These days, computing is as much (or more) about data than the actual computing power. And data is more than number of petabytes: it is so easy to get it unorganized, or stored in such a way that it slows down the computation.
In this tutorial, we go over places to store data on Triton and how to choose between them. The next tutorial tells how to access it remotely.
Abstract
See the Triton quick reference
There are many places to store files since they all make a different trade-off of speed, size, and backups.
We recommend scratch / $WRKDIR (below) for most cases.
We are a standard Linux cluster with these options:
$HOME
=/home/$USER
: 10GB, backed up, not made largerScratch is large but not backed up:
$WRKDIR
=/scratch/work/$USER
: Personal work directory/scratch/DEPARTMENT/NAME/
: Group-based shared directories (recommended for most work, group leaders can request them)
/tmp
: temporary directory, pre-user mounted in jobs and automatically cleaned up./l/
: local persistent storage on some group servers$XDG_RUNTIME_DIR
: ramfs on login node
See Remote access to data for how to transfer and access the data from other computers.

We are now looking at the data storage of a cluster.
Basics
Triton has various ways to store data. Each has a purpose, and when you are dealing with large data sets or intensive I/O, efficiency becomes important.
Roughly, we have small home directories (only for configuration files), large Lustre (scratch and work, large, primary calculation data), and special places for scratch during computations (local disks). At Aalto, there is aalto home, project, and archive directories which, unlike Triton, are backed up but don’t scale to the size of Triton.
Filesystem performance can be measured by both IOPS (input-output
operations per second) and stream I/O speed. /usr/bin/time -v
can
give you some hints here. You can see the profiling page for more information.
Think about I/O before you start! - General notes
When people think of computer speed, they usually think of CPU speed. But this is missing an important factor: How fast can data get to the CPU? In many cases, input/output (IO) is the true bottleneck and must be considered just as much as processor speed. In fact, modern computers and especially GPUs are so fast that it becomes very easy for a few GPUs with bad data access patterns to bring the cluster down for everyone.
The solution is similar to how you have to consider memory: There are different types of filesystems with different tradeoffs between speed, size, and performance, and you have to use the right one for the right job. Often times. So you have to use several in tandem: For example, store original data on archive, put your working copy on scratch, and maybe even make a per-calculation copy on local disks. Check out wikipedia:Memory Hierarchy and wikipedia:List of interface bit rates.
The following factors are useful to consider:
How much I/O are you doing in the first place? Do you continually re-read the same data?
What’s the pattern of your I/O and which filesystem is best for it? If you read all at once, scratch is fine. But if there are many small files or random access, local disks may help.
Do you write log files/checkpoints more often than is needed?
Some programs use local disk as swap-space. Only turn it on if you know it is reasonable.
There’s a checklist in the storage details page.
Avoid many small files! Use a few big ones instead. (we have a dedicated page on the matter)
Available data storage options
Each storage location has different sizes, speed, types of backups, and availability. You need to balance between these. Most routine work should go into scratch (group directories) or work (personal). Small configuration and similar can go into your home directory.
Name |
Path |
Quota |
Backup |
Locality |
Purpose |
---|---|---|---|---|---|
Home |
|
hard quota 10GB |
Nightly |
all nodes |
Small user specific files, no calculation data. |
Work |
|
200GB and 1 million files |
x |
all nodes |
Personal working space for every user. Calculation data etc. Quota can be increased on request. |
Scratch |
|
on request |
x |
all nodes |
Department/group specific project directories. |
Local temp |
|
limited by disk size |
x |
single-node |
Primary (and usually fastest) place for single-node calculation data. Removed once user’s jobs are finished on the node. |
Local persistent |
|
varies |
x |
dedicated group servers only |
Local disk persistent storage. On servers purchased for a specific group. Not backed up. |
ramfs (login nodes only) |
|
limited by memory |
x |
single-node |
Ramfs on the login node only, in-memory filesystem |
Home directories
The place you start when you log in. Home directory should be used for init files, small config files, etc. It is however not suitable for storing calculation data. Home directories are backed up daily. You usually want to use scratch instead.
scratch and work: Lustre
Scratch is the big, high-performance, 2PB Triton storage. It is the primary
place for calculations, data analyzes etc. It is not backed up but is
reliable against hardware failures (RAID6, redundant servers), but
not safe against human error.. It is
shared on all nodes, and has very fast access. It is divided into two
parts, scratch (by groups) and work (per-user). In general, always
change to $WRKDIR
or a group scratch
directory when you first
log in and start doing work. (note: home and work may be deleted six
months after your account expires: use a group-based space instead).
Lustre separates metadata and contents onto separate object and metadata servers. This allows fast access to large files, but induces a larger overhead than normal filesystems. See our small files page for more information.
Local disks
Local disks are on each node separately. It is used for the fastest I/Os
with single-node jobs and is cleaned up after job is finished. Since 2019,
things have gotten a bit more complicated given that our newest (skl) nodes
don’t have local disks. If you want to ensure you have local storage,
submit your job with --gres=spindle
.
See the Compute node local drives page for further details and script examples.
ramfs - fast and highly temporary storage
On login nodes only,
$XDG_RUNTIME_DIR
is a ramfs, which means that it looks like files
but is stored only in memory. Because of this, it is extremely fast,
but has no persistence whatsoever. Use it if you have to make small
temporary files that don’t need to last long. Note that this is no
different than just holding the data in memory, if you can hold in
memory that’s better.
Other Aalto data storage locations
Aalto has other non-Triton data storage locations available. See Filesystem details and Science-IT department data principles for more info.
Quotas
All directories under /scratch
(as well as /home
) have quotas. Two
quotas are set per-filesystem: disk space and file number. Quotas
exist not because we need to limit space, but because we need to make
people think before using large amounts of space. Ask us if you need more.
Disk quota and current usage are printed with the command quota
.
‘space’ is for the disk space and ‘files’ for the total number of files
limit. There is a separate quota for groups on which the user is a
member.
$ quota
User quotas for darstr1
Filesystem space quota limit grace files quota limit grace
/home 484M 977M 1075M 10264 0 0
/scratch 3237G 200G 210G - 158M 1M 1M -
Group quotas
Filesystem group space quota limit grace files quota limit grace
/scratch domain users 132G 10M 10M - 310M 5000 5000 -
/scratch some-group 534G 524G 524G - 7534 1000M 1000M -
/scratch other-group 16T 20T 20T - 1088M 5M 5M -
If you get a quota error, see the quotas page for a solution.
Remote access
The next tutorial, Remote access to data, covers accessing the data from your own computer.
Exercises
Most of these exercises will be specific to your local site. Use this time to review your local guides to see how they are adapted to your site.
Data storage locations:
Storage-1: Review data storage locations
(Optional) Look at the list of data storage locations above. Also look at the Filesystem details. Which do you think are suitable for your work? Do you need to share with others?
Storage-2: Your group’s data storage locations
Ask your group what they use and if you can use that, too.
Misc:
Storage-3: Common errors
What do all of the following have in common?
A job is submitted but fails with no output or messages.
I can’t start a Jupyter server on jupyter.triton.
Some files are randomly empty. Or the file had content, I tried to save it again, and now it’s empty!
I can’t log in.
I can log in with ssh, but
ssh -X
doesn’t work for graphical programs.I get an error message about corruption, such as
InvalidArchiveError("Error with archive ... You probably need to delete and re-download or re-create this file.
I can’t install my own Python/R/etc libraries.
Solution
All of these can be caused by exceeding the quota.
(don’t worry, “can’t log in” doesn’t apply to basic ssh login, so you can always still fix it yourself)
About filesystem performance:
strace
is a command which tracks system calls, basically the
number of times the operating system has to do something. It can be
used as a rudimentary way to see how much I/O load there is.
Storage-4: strace and I/O operations
Use strace -c
to compare the number of system calls in ls
,
ls -l
, on a directory with many files. On Triton, you can use
the directory /scratch/scip/lustre_2017/many-files/
as a place
with many files in it. How many system calls per file were there
for each option?
Solution
Running strace -c ls /scratch/scip/lustre_2017/many-files/
shows you
that ls took 171 system calls to get the information. By comparison,
ls -l
takes 5210 system calls due to all the additional information
it gives. This might not matter in normal situation, but these system calls
can quickly pile up if used in a script.
Storage-5: strace and time
Using strace -c
, compare the times of find
and lfs find
on the directory mentioned above. Why is it different?
(advanced) Storage-6: Benchmarking
(this exercise requires slurm knowledge from future tutorials and also other slurm knowledge).
Clone the https://github.com/AaltoSciComp/hpc-examples/ git
repository to your personal work directory. Change to the io
directory. Create a temporary directory and…
Run
create_iodata.sh
to make some data files indata/
Compare the IO operations of
find
andlfs find
on this directory.use the
iotest.sh
script to do some basic analysis. How long does it take? Submit it as a slurm batch job.Modify the iotest.sh script to copy the
data/
directory to local storage, do the operations, then remove the data. Compare to previous strategy.Use
tar
to compress the data while it is on lustre. Unpack this tar archive to local storage, do the operations, then remove. Compare to previous strategies.
What’s next?
See also
If you are doing heavy I/O: Storage
The next tutorial is about remote data access.
Remote access to data
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
The cluster is just one part of your research: most people are constantly transferring data back and forth. Unfortunately, this can be a frustrating experience if you haven’t got everything running smoothly. In this tutorial, we’ll explain some of the main methods. See the main storage tutorial first.
Abstract
Data is also available from other places in Aalto, such as desktop workstations in some departments, shell servers, and https://vdi.aalto.fi.
Transferring data is available via ssh (the standard
rsync
andsftp
)Data can be mounted remotely using ssh (
sshfs
, from anywhere with ssh access) and SMB mounting on your own computer (within Aalto networks, Linux/mac:smb://data.triton.aalto.fi/PATH
, Windows:\\data.triton.aalto.fi\PATH
and uses\
,PATH
could bework/USERNAME
orscratch/DEPT/GROUPNAME
)
![]() Mounting data: on your machine, you have a view of the data directly on the cluster: there is only one copy. |
![]() Copying data: there become two copies that you have to manage. |
History and background
Historically, ssh
transfers have been the most common (which
includes rsync
(recommended these days), scp
, sftp
, and
various other graphical programs that use these protocols) - and this
is still the most robust and reliable method. There are other modern
methods, but they require other things.
There are two main styles of remote data access:
Transferring data makes a new copy on the other computer. This is generally efficient for large data.
Remote mounting makes a view of the data on the other computer: when you access/modify the data on the other computer, it transparently accesses/modifies in the original place without making a copy. This is very convenient, but generally slow.
We have this already set up for you from many computers at Aalto.
Data availability throughout Aalto
Data is the basis of almost everything we do, and accessing it seamlessly throughout Aalto is a great benefit. Various other Aalto systems have the data available. However, this varies per department: each department can manage its data as it likes. So, we can’t make general promises about what is available where.
Linux shell server mounts require a valid Kerberos ticket (usually
generated when you log in). On long sessions these might expire, and
you have to renew them with kinit
to keep going. If you get a
permission denied, try kinit
.
Virtual desktop interface
VDI, vdi.aalto.fi, is a Linux workstation
accessible via your web browser, and useful for a lot of work. It is
not Triton, but has scratch mounted at /m/triton/scratch/
. Your
work folder can be access at /m/triton/scratch/work/USERNAME
. For
SCI departments the standard paths you have on your workstations are
also working /m/{cs,nbe}/{scratch,work}/
.
Shell servers
Departments have various shell servers, see below. There isn’t a generally available shell server anymore.
NBE
On workstations, work directories are available at /m/nbe/work
and
group scratch directories at /m/nbe/scratch/PROJECT/
, including
the shell server amor.org.aalto.fi
.
PHYS
Directories available on demand through SSHFS. See the Data transferring page at PHYS wiki.
CS
On workstations, work directories are available at /m/cs/work/
,
and group scratch directories at /m/cs/scratch/PROJECT/
. The
department shell server is magi.cs.aalto.fi
and has these
available.
Remote mounting
There are many ways to access Triton data remotely. These days, we recommending figuring out how to mount the data remotely, so that it appears as local data but is accessed over the network. This saves copying data back and forth and is better for data security, but is slower and less reliable than local data.

Mounting data: on your machine, you have a view of the data directly on the cluster: there is only one copy.
Remote mounting using SMB
By far, remote mounting of files is the easiest method to transfer
files. If you are not on the Aalto networks (wired, eduroam
, or
aalto
with Aalto-managed laptop), connect to the Aalto VPN first. Note that this is automatically done on
some department workstations (see above) - if not, request it!
The scratch filesystem can be remote mounted using SMB inside secure Aalto networks at the URLs
scratch:
\\data.triton.aalto.fi\scratch\
work:
\\data.triton.aalto.fi\work\%username%\
To access these folders: To do the mounting, Windows Explorer → This PC → Map network drive → select a free letter.
scratch:
smb://data.triton.aalto.fi/scratch/
work:
smb://data.triton.aalto.fi/work/USERNAME/
To access these folders: Finder → Go menu item → Connect to server → use the URLs above.
scratch:
smb://data.triton.aalto.fi/scratch/
work:
smb://data.triton.aalto.fi/work/USERNAME/
To access these folders: Files → Left sidebar → Connect to server → use the URLs above. For other Linuxes, you can probably figure it out. (It varies depending on operating system, look around in the finder)
From Aalto managed computers, you can use lgw01.triton.aalto.fi
instead of data.triton.aalto.fi
and it might auto-login.
Depending on your OS, you may need to use either your username
directly or AALTO\username
.
Warning
In the future, you will only be able to do this from Aalto managed computers. This remote mounting will really help your work, so we recommend you to request an Aalto managed computer (citing this section) to make your work as smooth as possible (or use vdi.aalto.fi, see below.
Remote mounting using sshfs
sshfs
is a neat program that lets you mount remote filesystems via
ssh only. It is well-supported in Linux, and somewhat on other
operating systems. Its true advantage is that you can mount any
remote ssh server - it doesn’t have to be set up specially for SMB or
any other type of mounting. On Ubuntu an other Linuxes, you can mount
by “File → Connect to server” and using
sftp://triton.aalto.fi/scratch/work/USERNAME
. This also works from
any shell server with data (see previous section).
The below uses command line programs to do the same, and makes the
triton_work
on your local computer access all files in
/scratch/work/USERNAME
. Can be done with other folders, too:
$ mkdir triton_work
$ sshfs USERNAME@triton.aalto.fi:/scratch/work/USERNAME triton_work
Note that ssh
binds together many ways of accessing Triton (and
other servers), with a similar syntax and options. Learning to use it
well is a great investment in your future. Learn more about ssh on
the ssh page - if you set up a ssh config file,
it will work here, too!
For Aalto Linux workstation users: it is recommended that you mount
/scratch/
under the local disk /l/
. You should be able to
create the subfolder folder under /l/
and point sshfs to that
subfolder as in the example here above.
Transferring data
This section tells ways you can copy data back-and-forth between Triton and your own computers. This may be more annoying for day-to-day work but is better for transferring large data.

Copying data: there become two copies that you have to manage.
Version control
Don’t forget that you can use version control (git, etc.) for your code and other small files. This way, you transfer to/from Triton via a version control server (Aalto Gitlab, Github, etc). Often, one would develop locally (committing often of course), pull on Triton, do whatever some minor development directly on Triton to make it work there, then push back to the server.
Mount and copy
You know, you can do the network drive mounting (see previous section), and copy files that way.
Using rsync
Prerequisites
To install rsync on windos please refer to this guide
Rsync is good for large files since it can restart interrupted
transfers. Use rsync for large file transfers. rsync
actually
uses the ssh protocol so you can rsync
from anywhere you can
ssh
from. rsync
is installed by default on Linux and Mac
terminals. On Windows machines we recommend using GIT-bash.
While there are better places on the internet to read about rsync, it is good to try it out to synchronise a local folder on your triton’s scratch. Sometimes the issue with copying files is related to group permissions. This command takes care of permissions and makes sure that all your local files are identical (= same MD5 fingerprint) to your remote files:
$ rsync -avzc -e "ssh" --chmod=g+s,g+rw --group=GROUPNAME PATHTOLOCALFOLDER USERNAME@triton.aalto.fi:/scratch/DEPT/PROJECTNAME/REMOTEFOLDER/
Replace the bits in CAPS with your own case. Briefly, -a
tries to
preserve all attributes of the file, -v
increases verbosity to see
what rsync is doing, -z
uses compression, -c
skips files that
have identical MD5 checksum, -e
specifies to use ssh (not
necessary but needed for the commands coming after), --chmod
sets
the group permissions to shared (as common practice on scratch project
folders), and --group
sets the groupname to the group you belong
to (note that GROUPNAME == PROJECTNAME on our scratch filesystem).
If you want to just check that your local files are different from the remote ones, you can run rsync in “dry run” so that you only see what the command would do, without actually doing anything.:
$ rsync --dry-run -avzc ...
Sometimes you want to copy only certain files. E.g. go through all
folders, consider only files ending with py
:
$ rsync -avzc --include '*/' --include '*.py' --exclude '*' ...
Sometimes you want to copy only files under a certain size (e.g. 100MB):
$ rsync -avzc --max-size=100m ...
Rsync does NOT delete files by default, i.e. if you delete a file from
the local folder, the remote file will not be deleted automatically,
unless you specify the --delete
option.
Please note that when working with files containing code or simple text, git is a better option to synchronise your local folder with your remote one, because not only it will keep the two folders in sync, but you will also gain version controlling so that you can revert to previous version of your code, or txt/csv files.
Using sftp
The SFTP protocol uses ssh to transfer files. On Linux and Mac, the
sftp
command line program are the must fundamental way to do this,
and are available everywhere.
A more user-friendly way of doing this (with a nice GUI) is the Filezilla program. Make sure you are using Aalto VPN, then you can put triton.aalto.fi as SFTP server with port 22.
With all modern OS it is also possible to just open your OS file manager (e.g. Nautilus on Linux) and just put as address in the bar:
sftp://triton.aalto.fi
If you are connecting from remote and cannot use the VPN, you can connect instead to department machines like kosh.aalto.fi, amor.org.aalto.fi (for NBE). The port is 22. Note: If you do not see your shared folder, you need to manually specify the full path (i.e. the folder is there, just not yet visible).
Exercises
RemoteData-1: Mounting your work directory
Mount your work directory by SMB (or sshfs) and transfer a file to
Triton. Note that for SMB, you must be connected to the Aalto VPN
(from outside campus), or on eduroam
, the aalto
with Aalto
laptop (from campus).
(advanced) RemoteData-2: rsync
If you have a Linux or Mac computer, or have installed it on
Windows, study the rsync
manual page and try to transfer a
file.
What’s next?
The next tutorial is about how the cluster queuing system Slurm works.
Running calculations
Slurm: the queuing system
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
What is a cluster?
Triton is a large system that combines many different individual computer nodes. Hundreds of people are using Triton simultaneously. Thus resources (CPU time, memory, etc.) need to be shared among everyone.
This resource sharing is done by a software called a job scheduler or workload manager, and Triton’s workload manager is Slurm (which is also the dominant in the world one these days). Triton users submit jobs which are then scheduled and allocated resources by the workload manager.

Slurm allows you to control all of the computing power from the login node.
An analogy: the HPC Diner
You’re eating out at the HPC Diner. What happens when you arrive?
Scheduling resources
A host greets you and takes your party size and estimated dining time. You are given a number and asked to wait a bit.
The host looks at who is currently waiting and makes a plan.
If you are two people, you might squeeze in soon.
If you are a lot of people, the host will try to slowly free up enough tables to join to eat together.
If you are a really large party, you might need an advance reservation (or have to wait a really long time).
Groups are called when it is their turn.
Resources (tables) are used as efficiently as possible
Cooking in the background
You don’t use your time to cook yourself.
You make an order. It goes to the back and gets cooked (possibly a lot at once!), and you can do something else.
Your food comes out when ready and you can check the results.
Asynchronous execution allows more efficient dining.
Thanks to HPC Carpentry / Sabry Razick for the idea.
The basic process
You have your program mostly working
You decide what resources you want
You ask Slurm to give you those resources
You might say “run this and let me know when done”, this is covered later in Serial Jobs.
You might want those resources to play around yourself. This is covered next in Interactive jobs.
If you are doing the first one, you come back later and check the output files.
The resources Slurm manages
Slurm comes with a multitude of parameters which you can specify to ensure you will be allocated enough memory, CPU cores, time, etc.
Imagine resource requests as boxes of a requested number of CPUs, memory, time, and any other resources requested. The smaller the box, the more likely you can get scheduled soon.
The basic resources are:
Time: While not exactly a resources, you need to specify the expacted usage time (run time) of each job for scheduling purposes. If you go over by too much, your job will be killed. This is
--time
, for example--time=DAYS-HH:MM:SS
.Memory: Memory is needed for data in jobs. If you run out of processors, your job is slow, but if you run out of memory, then everything dies. This is
--mem
or--mem-per-cpu
.CPUs (also known as “processors” or “(processor) cores”): Processor cores. This resource lets you do things in parallel the classic way, by adding processors. Depending on how the parallelism works, there are different ways to request the CPUs - see Parallel computing: different methods explained. CPUs. This is
--cpus-per-task
and--ntasks
, but you must read that page before using these!GPUs: Graphical Processing Units are modern, highly parallel compute units. We will discuss requesting them in GPU computing.
If you did even larger work on larger clusters, input/output bandwidth and licenses are also possible resources.
The more resources you request, the lower your priority will be in the future. So be careful what you request!
See also
As always, the Triton quick reference lists all the options you need.
Other submission parameters
We won’t go into them, but there are other parameters that tell Slurm what to do. For example, you could request to only run on the latest CPU architecture. You could say you want a node all to yourself. And so on.
How many resources to request?
See also
This is one of the most fundamental questions:
You want to request enough resources, so that your code actually runs.
You don’t want to request too much, since it is wasteful and lowers your priority in the future.
Basically, people usually start by guessing and request more than you
think you need at the start for testing. Check what you have
actually used (Triton: slurm history
), and adjust the requests to
match.
The general rule of thumb is to request the least possible, so that your stuff can run faster. That is because the less you request, the faster you are likely to be allocated resources. If you request something slightly less than a node size (note that we have different size nodes) or partition limit, you are more likely to fit into a spare spot.
For example, we have many nodes with 12 cores, and some with 20 or 24. If you request 24 cores, you have very limited options. However, you are more likely to be allocated a node if you request 10 cores. The same applies to memory: most common cutoffs are 48, 64, 128, 256GB. It’s best to use smaller values when submitting interactive jobs, and more for batch scripts.
Partitions
A slurm partition is a set of computing nodes dedicated to a specific purpose. Examples include partitions assigned to debugging(“debug” partition), batch processing(“batch” partition), GPUs(“gpu” partition), etc.
On Triton, you don’t need to worry about partitions most of the time - they are automatically set. You might need partition in several cases though:
--partition debug
gives you some nodes reserved for quick testing.--partition interactive
gives you some settings optimized for interactive work (where things aren’t running constantly).
On other clusters, you might need to set a partition other times.
Command sinfo -s
lists a summary of the available partitions. You
can see the purpose and use of our partitions in the quick
reference.
Exercises
Slurm-1: Info commands
Check out some of these commands: sinfo
, sinfo -N
,
squeue
, and squeue -a
. These give you some information
about Slurm’s state.
What’s next?
We move on to running interactive jobs.
Interactive jobs
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Abstract
Interactive jobs allow you to quickly test code (before scaling up) or getting more resources for manual analysis.
To run a single command interactively
srun [SLURM OPTIONS] COMMAND ...
to run before any COMMAND to run it in Slurm
To get an interactive shell
srun [SLURM OPTIONS] --pty bash
(general Slurm)sinteractive
(Triton specific)
The exact commands often varies among clusters, check your docs.

Interactive jobs let you control a small amount of resources for development work.
Why interactive jobs?
There are two ways you can submit your jobs to Slurm queue system:
either interactively using srun
or by submitting a script
using sbatch
. This tutorial walks you through running your jobs
interactively, and the next tutorial on serial jobs
will go through serial jobs.
Some people say “the cluster is for batch computing”, but really it is to help you get your work done. Interactive jobs let you:
Run a single job in the Slurm job allocation to test parameters and make sure it works (which is easier than constantly modifying batch scripts).
Get a large amount of resources for some manual data analysis.
Interactive jobs
Let’s say you want to run the following command:
$ python3 slurm/pi.py 10000
You can submit this program to Triton using srun
. All input/output still goes to your terminal
(but note that graphical applications don’t work this way - see
below):
$ srun --mem=100M --time=0:10:00 python3 slurm/pi.py 10000
srun: job 52204499 queued and waiting for resources
Here, we are asking for 100 Megabytes of memory (--mem=100M
) for a
duration of ten minutes (--time=0:10:00
) (See the quick
reference or below for more options).
While your job - with jobid 52204499 - is waiting to be allocated resources, your shell
blocks.
You can open a new shell (ssh again) on the cluster and run the command
squeue -u $USER
or slurm q
to see all the jobs
you currently have waiting in queue:
$ slurm q
JOBID PARTITION NAME TIME START_TIME STATE NODELIST(REASON)
52204499 short-ivb python3 0:00 N/A PENDING (None)
You can see information such as the state, which partition the requested node reside in, etc.
Once resources are allocated to your job, you see the name of the machine in the cluster your program ran on, output to your terminal:
srun: job 52204499 has been allocated resources
{"pi_estimate": 3.126, "iterations": 10000, "successes": 7815}
To show it’s running on a diferent computer, you can srun
hostname
(in this case, it runs on csl42
):
$ hostname
login3.triton.aalto.fi
$ srun hostname
srun: job 19039411 queued and waiting for resources
srun: job 19039411 has been allocated resources
csl42.int.triton.aalto.fi
Disadvantages
Interactive jobs are useful for debugging purposes, to test your setup and configurations before you put your tasks in a batch script for later execution.
The major disadvantages include:
It blocks your shell until it finishes
If your connection to the cluster gets interrupted, you lose the job and its output.
Keep in mind that you shouldn’t open 20 shells to run 20 srun
jobs at once.
Please have a look at the next tutorial about serial jobs.
Interactive shell
What if you want an actual shell to do things interactively?
Put more precisely, you want access to a node in the cluster
through an interactive bash shell, with many resources available, that
will let you run commands such as Python and let do some basic work.
For this, you just need srun’s --pty
option coupled with the shell
you want:
$ srun -p interactive --time=2:00:00 --mem=600M --pty bash
The command prompt will appear when the job starts.
And you will have a bash shell runnnig on one of the
computation nodes with at least 600 Megabytes of memory,
for a duration of 2 hours, where you can run your programs in.
The option -p interactive
requests a node in the interactive
partition (group of nodes) which is dedicated to interactive usage
(more on this later).
Warning
Remember to exit the shell when you are done! The shell will be running if you don’t and it will count towards your usage. This wastes resources and effectively means your priority will degrade in the future.
Interactive shell with graphics
sinteractive
is very similar to srun
, but more clever and thus
allows you to do X forwarding. It starts a screen session on the node,
then sshes to there and connects to the shell:
$ sinteractive --time=1:00:00 --mem=1000M
Warning
Just like with srun --pty bash
, remember to exit the shell.
Since there is a separate screen session running, just closing the terminal isn’t enough.
Exit all shells in the screen session on the node (C-d or exit
) or cancel
the job.
Use remote desktop if off campus
If you are off-campus, you might want to use https://vdi.aalto.fi as a
virtual desktop to connect to Triton to run graphical programs: ssh
from there to Triton with ssh -XY
. Graphical programs run very
slowly when sent across the general Internet.
Checking your jobs
When your jobs enter the queue, you need to be able to get information on how much time, memory, etc. your jobs are using in order to know what requirements to ask for. We’ll see this later in Monitoring job progress and job efficiency.
The command slurm history
(or sacct --long | less
) gives you
information such as the actual memory used by your recent jobs, total
CPU time, etc. You will learn more about these commands later on.
As shown in a previous example, the command slurm queue
(or
squeue -u $USER
) will tell you the currently running processes,
which is a good way to make sure you have stopped everything.
Setting resource parameters
Remember to set the resources you need well, otherwise your are wasting resources and lowering your priority. We went over this in Slurm: the queuing system.
Exercises
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
Interactive-2: Time scaling
The program hpc-examples/slurm/pi.py
calculates pi using a simple stochastic algorithm. The program takes
one positional argument: the number of trials.
The time
program allows you to time any program, e.g. you can
time python x.py
to print the amount of time it takes.
Run the program, timing it with
time
, a few times, increasing the number of trials, until it takes about 10 seconds:time python hpc-examples/slurm/pi.py 500
, then 5000, then 50000, and so on.Add
srun
in front (srun python ...
). Use theseff JOBID
command to see how much time the program took to run. (If you’d like to use thetime
command, you can runsrun --mem=MEM --time=TIME time python hpc-examples/slurm/pi.py ITERS
)Look at the job history using
slurm history
- can you see how much time each process used? What’s the relation between TotalCPUTime and WallTime?
Solution
$ time python3 slurm/pi.py 5000 Calculating pi via 5000 stochastic trials {"pi_estimate": 3.1384, "iterations": 5000, "successes": 3923} real 0m0.095s user 0m0.082s sys 0m0.014s $ time python3 slurm/pi.py 50000 Calculating pi via 50000 stochastic trials {"pi_estimate": 3.13464, "iterations": 50000, "successes": 39183} real 0m0.154s user 0m0.134s sys 0m0.020s $ time python3 slurm/pi.py 500000 Calculating pi via 500000 stochastic trials {"pi_estimate": 3.141776, "iterations": 500000, "successes": 392722} real 0m0.792s user 0m0.766s sys 0m0.023s $ time python3 slurm/pi.py 5000000 Calculating pi via 5000000 stochastic trials {"pi_estimate": 3.1424752, "iterations": 5000000, "successes": 3928094} real 0m6.287s user 0m6.262s sys 0m0.026s
$ srun python3 slurm/pi.py 5000000 srun: job 19201873 queued and waiting for resources srun: job 19201873 has been allocated resources Calculating pi via 5000000 stochastic trials {"pi_estimate": 3.1424752, "iterations": 5000000, "successes": 3928094} $ srun python3 slurm/pi.py 50000000 srun: job 19201880 queued and waiting for resources srun: job 19201880 has been allocated resources Calculating pi via 50000000 stochastic trials {"pi_estimate": 3.14153752, "iterations": 50000000, "successes": 39269219} $ srun python3 slurm/pi.py 500000000 srun: job 19201910 queued and waiting for resources srun: job 19201910 has been allocated resources Calculating pi via 500000000 stochastic trials {"pi_estimate": 3.14152692, "iterations": 500000000, "successes": 392690865}
$ seff 19201873 Job ID: 19201873 Cluster: triton User/Group: darstr1/darstr1 State: COMPLETED (exit code 0) Cores: 1 CPU Utilized: 00:00:04 CPU Efficiency: 100.00% of 00:00:04 core-walltime Job Wall-clock time: 00:00:04 Memory Utilized: 1.21 MB Memory Efficiency: 0.24% of 500.00 MB $ seff 19201880 ... CPU Utilized: 00:00:44 CPU Efficiency: 97.78% of 00:00:45 core-walltime Job Wall-clock time: 00:00:45 ... $ seff 19201910 ... CPU Utilized: 00:07:51 CPU Efficiency: 99.58% of 00:07:53 core-walltime Job Wall-clock time: 00:07:53 ...
each process should be visible as a separate step indexed from 0. For larger iterations, TotalCpuTime should be similar to WallTime, Since TotalCpuTime shows amount of time Cpus were at full utilization, times the number of Cpus. Note that TotalCPUTime has precision of milliseconds, whereas WallTime has precision of seconds.
JobID JobName Start ReqMem MaxRSS TotalCPUTime WallTime Tasks CPU Ns Exit State Nodes 19201873 python3 06-06 23:18:21 500M - 00:04.044 00:00:04 none 1 1 0:0 COMP csl48 └─ extern * 06-06 23:18:21 0M 00:00.001 00:00:04 1 1 1 0:0 COMP csl48 └─ 0 python3 06-06 23:18:21 1M 00:04.043 00:00:04 1 1 1 0:0 COMP csl48 19201880 python3 06-06 23:18:35 500M - 00:44.417 00:00:45 none 1 1 0:0 COMP csl48 └─ extern * 06-06 23:18:35 1M 00:00.001 00:00:45 1 1 1 0:0 COMP csl48 └─ 0 python3 06-06 23:18:35 1M 00:44.415 00:00:45 1 1 1 0:0 COMP csl48 19201910 python3 06-06 23:19:25 500M - 07:51.107 00:07:53 none 1 1 0:0 COMP csl48 └─ extern * 06-06 23:19:25 1M 00:00.001 00:07:53 1 1 1 0:0 COMP csl48 └─ 0 python3 06-06 23:19:25 10M 07:51.106 00:07:53 1 1 1 0:0 COMP csl48
Interactive-3: Info commands
Run squeue -a
to see what is running, and then run slurm job
JOBID
(or scontrol show job JOBID
) on some running job - does
anything look interesting?
Solution
There’s possibly some interesting things here, if you can get it out of all the rest:
$ slurm job 19203764
JobId=19203764 JobName=python3
UserId=darstr1(1300204) GroupId=darstr1(1300204) MCS_label=N/A
Priority=630255 Nice=0 Account=aalto_users QOS=normal
JobState=COMPLETED Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=00:00:06 TimeLimit=00:15:00 TimeMin=N/A
SubmitTime=2023-06-07T00:10:23 EligibleTime=2023-06-07T00:10:23
AccrueTime=2023-06-07T00:10:23
StartTime=2023-06-07T00:10:25 EndTime=2023-06-07T00:10:31 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2023-06-07T00:10:25 Scheduler=Main
Partition=batch-csl AllocNode:Sid=triton:4896
ReqNodeList=(null) ExcNodeList=(null)
NodeList=csl48
BatchHost=csl48
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=500M,energy=2391,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=500M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=python3
WorkDir=/home/darstr1/git/hpc-examples
Power=
Interactive-4: Showing node information
Run scontrol show node csl1
What is this? (csl1
is the
name of a node on Triton - if you are not on Triton, look at the
sinfo -N
command and try one of those names).
Interactive-5: Why not script srun
Some people are clever and use shell scripting to run
srun
many times in a loop (using&
to background it so that they all run at the same time). Can you list some advantages and disadvantages to this?
Solution
In does work, but it’s fragile: if the login node dies, everything
gets lost. It’s actually more work than doing it properly
(Array jobs: embarassingly parallel execution). And Slurm knows all array jobs are the same, so
it takes less resources to manage them - if someone scripts too
many srun
s, it can actually block other jobs from running
when they could otherwise.
What’s next?
In the next tutorial on serial batch jobs, you will learn how to put the above-mentioned commands in a script, namely a batch script (a.k.a submission script) that allows for a multitude of jobs to run unattended.
Serial Jobs
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Abstract
Batch scripts let you run work non-interactively, which is important for scaling. You create a batch script, which runs in the background. You come back later and see the results.
Example batch script, submit with
sbatch the_script.sh
:#!/bin/bash -l #SBATCH --time=01:00:00 #SBATCH --mem=4G # Run your code here python my_script.py
See the quick reference for complete list of options.

This tutorial covers the basics of serial jobs. With what you are learning so far, you can control a small amount of power of the cluster.
Prerequisites
Why batch scripts?
You learned, in Slurm: the queuing system, how all Triton users must do their computation by submitting jobs to the Slurm batch system to ensure efficient resource sharing. This lets you run many things at once without having to watch each one separately - the true power of the cluster.
A batch script is simply a shell script (remember Using the cluster from a shell?), where you put your resource requests and job steps.
Your first job script
A job script is simply a shell script (Bash). And so the first line in
the script should be the shebang directive (#!
)
followed by the full path to the executable binary of the shell’s
interpreter, which is Bash in our case. What then follow are the
resource requests, and then the job steps.
Let’s take a look at the following script
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --mem=100M
#SBATCH --output=pi.out
echo "Hello $USER! You are on node $HOSTNAME. The time is $(date)."
# For the next line to work, you need to be in the
# hpc-examples directory.
srun python3 slurm/pi.py 10000
Let’s name it run-pi.sh
(create a file using your editor of choice,
e.g. nano
; write the script above and save it)
The symbol #
is a comment in the bash script, and Slurm
understands #SBATCH
as parameters, determining the resource
requests. Here, we have requested a time limit of 5 minutes, along
with 100 MB of RAM.
Resource requests are followed by job steps, which are the actual
tasks to be done. Each srun
within the a slurm script is a job
step, and appears as a separate row in your history - which is useful
for monitoring.
Having written the script, you need to submit the job to Slum through
the sbatch
command. Since the command is python slurm/pi.py
,
you need to be in the hpc-examples directory from our sample
project:
$ cd hpc-examples # wherever you have hpc-examples
$ sbatch run-pi.sh
Submitted batch job 52428672
Warning
You must use sbatch
, not bash
to submit the job since it is
Slurm that understands the SBATCH
directives, not Bash.
When the job enters the queue successfully, the response that the job has been submitted is printed in your terminal, along with the jobid assigned to the job.
You can check the status of you jobs using slurm q
/slurm queue
(or
squeue -u $USER
):
$ slurm q
JOBID PARTITION NAME TIME START_TIME STATE NODELIST(REASON)
52428672 debug run-pi.sh 0:00 N/A PENDING (None)
Once the job is completed successfully, the state changes to
COMPLETED and the output is then saved to pi.out
in the
current directory. You can also wildcards like %u
for your
username and %j
for the jobid in the output file name. See the
documentation of sbatch
for a full list of available wildcards.
Setting resource parameters
The resources were discussed in Slurm: the queuing system, and barely need to be
mentioned again here - the point is they are the same. For example,
you might use --mem=5G
or --time=5:00:00
. Always keep the
reference page close for looking these up.
Checking your jobs
Once you submit your jobs, it goes into a queue. The two most useful
commands to see the status of your jobs with are slurm q
/slurm
queue
and slurm h
/slurm history
(or squeue -u $USER
and
sacct -u $USER
).
More information is in the monitoring tutorial.
Cancelling a job
You can cancel jobs with scancel JOBID
. To obtain job id, use the
monitoring commands.
Full reference
The reference page contains it all, or expand it below.
Slurm quick ref
Command |
Description |
---|---|
|
submit a job to queue (see standard options below) |
|
Within a running job script/environment: Run code using the allocated resources (see options below) |
|
On frontend: submit to queue, wait until done, show output. (see options below) |
|
Submit job, wait, provide shell on node for interactive playing (X forwarding works, default partition interactive). Exit shell when done. (see options below) |
|
(advanced) Another way to run interactive jobs, no X forwarding but simpler. Exit shell when done. |
|
Cancel a job in queue |
|
(advanced) Allocate resources from frontend node. Use |
|
View/modify job and slurm configuration |
Command |
Option |
Description |
---|---|---|
|
|
time limit |
|
time limit, days-hours |
|
|
job partition. Usually leave off and things are auto-detected. |
|
|
request n MB of memory per core |
|
|
request n MB memory per node |
|
|
Allocate *n* CPU’s for each task. For multithreaded jobs. (compare ``–ntasks``: ``-c`` means the number of cores for each process started.) |
|
|
allocate minimum of n, maximum of m nodes. |
|
|
allocate resources for and start n tasks (one task=one process started, it is up to you to make them communicate. However the main script runs only on first node, the sub-processes run with “srun” are run this many times.) |
|
|
short job name |
|
|
print output into file output |
|
|
print errors into file error |
|
|
allocate exclusive access to nodes. For large parallel jobs. |
|
|
request feature (see |
|
|
Run job multiple times, use variable |
|
|
request a GPU, or |
|
|
request nodes that have disks, |
|
|
notify of events: |
|
|
whome to send the email |
|
|
|
Print allocated nodes (from within script) |
Exercises
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
Serial-1: Basic batch job
Submit a batch job that just runs hostname
and pi.py
.
Remember to give pi.py some number of iterations as an argument.
Set time to 1 hour and 15 minutes, memory to 500MB.
Change the job’s name and output file.
Check the output. Does the printed hostname match the one given by
slurm history
/sacct -u $USER
?
Solution
Output from hostname
should match the node in slurm history.
Sbatch first assigns you a node depending on your requested resources,
and then runs all commands included in the script.
Serial-2: Submitting and cancelling a job
Create a batch script which does nothing (or some pointless
operation for a while), for example sleep 300
(this shell
command does nothing for 300 seconds). Check the queue to see when
it starts running. Then, cancel the job. What output is produced?
Solution
You can check when your job starts running with slurm q
. Then
you can cancel it with scancel JOBID
, where JOBID
can be found
from slurm q
output. After cancelling the job, it should still produce
an output file (named either slurm-JOBID.out
or whatever you defined in the
#!/bin/bash
echo "We are waiting"
sleep 300
echo "We are done waiting"
srun python3 slurm/pi.py 1000000
You can check when your job starts running with slurm q
. Then
you can cancel it with scancel JOBID
, where JOBID
can be found
from slurm q
output. After cancelling the job, it should still produce
an output file (named either slurm-JOBID.out
or whatever you defined in the
sbatch file.) The output file also says the job was cancelled.
Serial-4: Modify script while it is running
Modifying scripts while a job has been submitted is a bad practice.
Add sleep 180
into the Slurm script that runs pi.py
. Submit the
script and while it is running, open the pi.py
with an editor of your
choice and add the following line near the start of the script.
raise Exception()
Use slurm q
to check when the job finishes and check the output. What
can you interpret from this?
Remove the created line after you have finished. You can also use
git checkout -- pi.py
(remember to give a proper relative path,
depending on your current working directory!)
Solution
In this case we modified the Python code before it had begun executing
(we added a line that raised an error while the sleep 180
was being
executed).
The code that the Slurm script executes will be determined when the script is running. It is not locked in place when you submit a script.
You should always make certain that you do not modify the code that the Slurm script will execute while it is queued or while its being run.
Otherwise you can get errors and you cannot replicate your run.
Serial-5: Checking output
You can look at the output of files as your program is running. Let’s demonstrate.
Create a slurm script that runs the following program. This is a shell script which, every 10 seconds (for 30 iterations), prints the date:
for i in $(seq 30); do
date
sleep 10
done
Submit the job to the queue.
Log out from Triton. Log back in and use
slurm queue
/squeue -u $USER
to check the job status.Use
cat NAME_OF_OUTPUTFILE
to check at the output periodically. You can usetail -f NAME_OF_OUTPUTFILE
to view the progress in real time as new lines are added (Control-C to cancel)Cancel the job once you’re finished.
Solution
#!/bin/bash
#SBATCH --output test-check-output.out
for i in $(seq 30); do
date
sleep 10
done
We note that a new line appears about every 10 seconds. Note that the delays might happen because of buffering, as the system tries to avoid doing too many small i/o operations:
$ tail -f test-check-output.out
Wed 7 Jun 10:49:58 EEST 2023
Wed 7 Jun 10:50:08 EEST 2023
(more every 10 seconds)
Serial-6: Constrain to a certain CPU architecture
Modify the script from exercise #1 to run on only one type of CPU
using the --constraint
option. Hint: check Triton quick reference
Solution
Simply add #SBATCH --constraint=X
to your sbatch script, or
give --constraint=X
to srun as additional argument. For example,
to run only on Haswell cpu’s you can add --constraint=hsw
, or
similarily for amd milan cpus --constraint=milan
. This also
works identically for gpus.
Serial-7: Why you use sbatch
, not bash
.
(Advanced) What happens if you submit a batch script with bash
instead of sbatch
? Does it appear to run? Does it use all the
Slurm options?
Solution
It looks like it runs, but actually is only running on the login
node! If you used srun python3 slurm/pi.py 10000
, then it
would request a Slurm allocation, but not use any of the
#SBATCH
parameters, so might not request the resources you
need.:
$ bash run-pi.sh
Calculating Pi via 10000 stochastic trials
{"successes": 7815, "pi_estimate": 3.126, "iterations": 10000}
(advanced) Serial-8: Interpreters other than bash
(Advanced) Create a batch script that runs in another language
using a different #!
line.
Does it run? What are some of the advantages and problems here?
Solution
Using other language to run your sbatch script is entirely possible.
For example if you are more used to writing scripts on zsh compared to bash,
you could use #!/bin/zsh
. You could even use something completely
different from a shell. For example using #!/usr/bin/env python3
would let you write python code directly in the sbatch script. This is
mostly an interesting curiosity however and is not usually practical.
(advanced) Serial-9: Job environment variables.
Either make a sbatch
script that runs the command env | sort
, or
use srun env | sort
. The env
utility prints all
environment variables, and sort
sorts it (and |
connects
the output of env
to the input of sort
.)
This will show all of the environment variables that are set in the
job. Note the ones that start with SLURM_
. Notice how they
reflect the job parameters. You can use these in your jobs if
needed (for example, a job that will adapt to the number of
allocated CPUs).
What’s next?
There are various tools one can use to do job monitoring.
Monitoring job progress and job efficiency
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Abstract
You must always monitor jobs to make sure they are using all the resources you request.
Test scaling: double resources, if it doesn’t run almost twice as fast, it’s not worth it.
seff JOBID
shows efficiency and performance of a single jobsslurm queue
shows waiting and running jobs (this is a custom command)slurm history
shows completed jobs (also custom command)GPU efficiency: A job’s
comment
field shows GPU performance info (custom setup at Aalto),sacct -j JOBID -o comment -p
shows this.
Introduction
When running jobs, one usually wants to do monitoring at various different stages:
Firstly, when job is submitted, one wants to monitor the position of the job in the queue and expected starting time for the job.
Secondly, when job is running, one wants to monitor the jobs state and how the simulations is performing.
Thirdly, once the job has finished, one wants to monitor the job’s performance and resource usage.
There are various tools available for each of these steps.
See also
Please ensure you have read Interactive jobs and Serial Jobs before you proceed with this tutorial.
Monitoring during queueing
The command slurm q
/slurm queue
(or squeue -u $USER
) can be used
to monitor the status of your jobs in the queue. An example output is given below:
$ slurm q
JOBID PARTITION NAME TIME START_TIME STATE NODELIST(REASON)
60984785 interacti _interactive 0:29 2021-06-06T20:41 RUNNING pe6
60984796 batch-csl hostname 0:00 N/A PENDING (Priority)
Here the output are as follows:
JOBID
shows the id number that Slurm has assigned for your job.PARTITION
shows the partition(s) that the job has been assigned to.NAME
shows the name of the submission script / job step / command.TIME
shows the amount of time of the job has run so far.START_TIME
shows the start time of the job. If job isn’t currently running, Slurm will try to form an estimate on when the job will run.STATE
shows the state of the job. Usually it isRUNNING
orPENDING
.NODES
shows the names of the nodes where the program is running. If the job isn’t running, Slurm tries to give a reason why the job is not running.
When submitting a job one often wants to see if job starts successfully.
This can be made easier by running slurm w q
/slurm watch queue
or (watch -n 15 squeue -u $USER
).
This opens a watcher that prints the output of slurm queue
every 15
seconds. This watcher can be closed with <CTRL> + C
. Do remember to
close the watcher when you’re not watching the output interactively.
To see all of the information that Slurm sees, one can use the command
scontrol show -d jobid JOBID
.
The slurm queue
is a wrapper built around squeue
-command. One can also
use it directly to get more information on the job’s status. See
squeue’s documentation for more
information.
There are other commands to slurm
that you can use to monitor the
cluster status, job history etc.. A list of examples is given below:
Slurm status info reference
Command |
Description |
---|---|
|
Status of your queued jobs (long/short) |
|
Overview of partitions (A/I/O/T=active,idle,other,total) |
|
list free CPUs in a partition |
|
Show status of recent jobs |
|
Show percent of mem/CPU used in job. See Monitoring. |
|
Show GPU efficiency |
|
Job details (only while running) |
|
Show status of all jobs |
|
Full history information (advanced, needs args) |
Full slurm command help:
$ slurm
Show or watch job queue:
slurm [watch] queue show own jobs
slurm [watch] q show user's jobs
slurm [watch] quick show quick overview of own jobs
slurm [watch] shorter sort and compact entire queue by job size
slurm [watch] short sort and compact entire queue by priority
slurm [watch] full show everything
slurm [w] [q|qq|ss|s|f] shorthands for above!
slurm qos show job service classes
slurm top [queue|all] show summary of active users
Show detailed information about jobs:
slurm prio [all|short] show priority components
slurm j|job show everything else
slurm steps show memory usage of running srun job steps
Show usage and fair-share values from accounting database:
slurm h|history show jobs finished since, e.g. "1day" (default)
slurm shares
Show nodes and resources in the cluster:
slurm p|partitions all partitions
slurm n|nodes all cluster nodes
slurm c|cpus total cpu cores in use
slurm cpus cores available to partition, allocated and free
slurm cpus jobs cores/memory reserved by running jobs
slurm cpus queue cores/memory required by pending jobs
slurm features List features and GRES
Examples:
slurm q
slurm watch shorter
slurm cpus batch
slurm history 3hours
Other advanced commands (many require lots of parameters to be useful):
Command |
Description |
---|---|
|
Full info on queues |
|
Advanced info on partitions |
|
List all nodes |
Monitoring a job while it is running
As the most common way of using HPC resources is to run non-interactive jobs, it is usually a good idea to make certain that the program that will be run will produce some output that can be used to monitor the jobs’ progress.
The typical way of monitoring the progress is to add print-statements that produce
output to the standard output. This output is then redirected to the Slurm
output file (-o FILE
, default slurm-JOBID.log
) where it can be
read by the user. This file is updated while the job is running, but after some
delay (every few KB written) because of buffering.
It is important to differentiate between different types of output:
Monitoring output is usually print statements and it describes what the program is doing (e.g. “Loading data”, “Running iteration 31”), what is the state of the simulation (e.g. “Total energy is 4.232 MeV”, “Loss is 0.432”) and to get timing information (e.g. “Iteration 31 took 182s”). This output can then be used to see if the program works, if the simulation converges and to determine how long does it take to do different calculations.
Debugging output is similar to monitoring output, but it is usually more verbose and writes the internal state of the program (e.g. values of variables). This is usually required during development stage of a program, but once the program works and longer simulations are needed, printing debugging output is not recommended.
Checkpoint output can be used to resume the current state of the simulation in the case of unexpected situations such as bugs, network problems or hardware failures. These should be in binary data as this keeps the accuracy of the floating point numbers intact. In big simulations checkpoints can be large, so the frequency of taking checkpoints should not be too high. In iterative processes e.g. Markov chain, taking checkpoints can be very quick and can be done more frequently. In smaller applications it is usually good to take checkpoints if the program starts a different phase of the simulation (e.g. plotting after simulation). This minimizes loss of simulation time due to programming bugs.
Simulation output is something that the program outputs when the simulation is done. When doing long simulations it is important to consider what output parameters do you want to output. One should include all parameters that might be needed so that the simulations do not need to be run again. When doing time series output this is even more important as e.g. averages, statistical moments cannot necessarily be recalculated after the simulation has ended. It is usually good idea to save a checkpoint at the end as well.
When creating monitoring output it is usually best to write it in a human-readable format and human-readable quantities. This makes it easy to see the state of the program.
Checking job history after completion
The command slurm h
/slurm history
can be used to check the history
of your jobs. Example output is given below:
$ slurm h
JobID JobName Start ReqMem MaxRSS TotalCPUTime WallTime Tasks CPU Ns Exit State Nodes
60984785 _interactive 06-06 20:41:31 500Mc - 00:01.739 00:07:36 none 1 1 0:0 CANC pe6
└─ batch * 06-06 20:41:31 500Mc 6M 00:01.737 00:07:36 1 1 1 0:0 COMP pe6
└─ extern * 06-06 20:41:31 500Mc 1M 00:00.001 00:07:36 1 1 1 0:0 COMP pe6
60984796 hostname 06-06 20:49:36 500Mc - 00:00.016 00:00:00 none 10 10 0:0 CANC csl[3-6,9,14,17-18,20,23]
└─ extern * 06-06 20:49:36 500Mc 1M 00:00.016 00:00:01 10 10 10 0:0 COMP csl[3-6,9,14,17-18,20,23]
Here the output are as follows:
JobID
shows the id number that Slurm has assigned for your job.JobName
shows the name of the submission script / job step / command.Start
shows the start time of the job.ReqMem
shows the amount of memory requested by the job. The format is an an amount in megabytes or gigabytes followed byc
orn
for memory per core or memory per node respectively.MaxRSS
shows the maximum memory usage of the job as calculated by Slurm. This is measured in set intervals.TotalCPUTime
shows the total CPU time used by the job. It shows the amount of seconds the CPUs were at full utilization. For single CPU jobs, this should be close to theWallTime
. For jobs that use multiple CPUs, this should be close to the number of CPUs reserved timesWallTime
.WallTime
shows the runtime of the job in seconds.Tasks
shows the number of MPI tasks reserved for the job.CPU
shows the number of CPUs reserved for the job.Ns
shows the number of nodes reserved for the job.Exit State
shows the exit code of the command. Successful run of the program should return 0 as the exit code.Nodes
shows the names of the nodes where the program ran.
The slurm history
-command is a wrapper built around sacct
-command. One
can also use it directly to get more information on the job’s status. See
sacct’s documentation for more
information.
For example, command
sacct --format=jobid,elapsed,ncpus,ntasks,state,MaxRss --jobs=JOBID
which will show information as indicated in the --format
option (jobid,
elapsed time, number of reserved CPUs, etc.). You can specify any field of
interest to be shown using --format
.
CheckingCPU and RAM efficiency after completion
You can use seff JOBID
to see what percent of available CPUs and RAM was
utilized. Example output is given below:
$ seff 60985042
Job ID: 60985042
Cluster: triton
User/Group: tuomiss1/tuomiss1
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 00:00:29
CPU Efficiency: 90.62% of 00:00:32 core-walltime
Job Wall-clock time: 00:00:16
Memory Utilized: 1.59 MB
Memory Efficiency: 0.08% of 2.00 GB
If your processor usage is far below 100%, your code may not be working correctly. If your memory usage is far below 100% or above 100%, you might have a problem with your RAM requirements. You should set the RAM limit to be a bit above the RAM that you have utilized.
You can also monitor individual job steps by calling seff
with the syntax
seff JOBID.JOBSTEP
.
Important
When making job reservations it is important to distinguish
between requirements for the whole job (such as --mem
) and
requirements for each individual task/cpu (such as --mem-per-cpu
).
E.g. requesting --mem-per-cpu=2G
with --ntasks=2
and --cpus-per-task=4
will create a total memory reservation of
(2 tasks)*(4 cpus / task)*(2GB / cpu)=16GB.
Monitoring a job’s GPU utilization
See also
GPU computing. We will talk about how to request GPUs later, but it’s kept here for clarity.
When running a GPU job, you should check that the GPU is being fully utilized.
When your job has started, you can ssh
to the node and run
nvidia-smi
. It should be close to 100%.
Once the job has finished, you can use slurm history
to obtain the
jobID
and run:
$ sacct -j JOBID -o comment -p
{"gpu_util": 99.0, "gpu_mem_max": 1279.0, "gpu_power": 204.26, "ncpu": 1, "ngpu": 1}|
This also shows the GPU utilization.
If the GPU utilization of your job is low, you should check whether
its CPU utilization is close to 100% with seff JOBID
. Having a high
CPU utilization and a low GPU utilization can indicate that the CPUs are
trying to keep the GPU occupied with calculations, but the workload
is too much for the CPUs and thus GPUs are not constantly working.
Increasing the number of CPUs you request can help, especially in tasks that involve data loading or preprocessing, but your program must know how to utilize the CPUs.
However, you shouldn’t request too many CPUs: There wouldn’t be enough CPUs for everyone to use the GPUs and they would go to waste (all of our nodes have 4-12 CPUs for each GPU).
Exercises
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
Monitoring-1: Adding more verbosity into your scripts
echo
is a shell command which prints something - the equivalent of “print debugging”.
date
is a shell command that prints the current date and time. It is useful for getting
timestamps.
Modify one of the scripts from Serial Jobs with a lot of echo MY LINE OF TEXT
commands
to be able to verify what it’s doing. Check the output.
Now change the script and add date
-command below the echo
-commands.
Run the script and check the output. What do you see?
Now change the script, remove the echos, and add “set -x” below the
#SBATCH
-comments. Run the script again. What do you see?
Solution
Using echo
-commands is a good way of verifying what part
of the script is being executed.
Using date
-commands in your script is a good way of checking
when something was executed.
Using set -x
will cause the shell to print every command it
executes before it executes them. It is useful for debugging
complex scripts with if-else-clauses, where you might not know
what is exactly being executed:
#!/bin/bash
#SBATCH --time=0:10:00
set -x
srun python3 slurm/pi.py 10000
The output (notice the +srun python3 ...
line. This is
automatically printed right before the command runs.):
$ cat slurm-19207417.out
+ srun python3 slurm/pi.py 10000
Calculating pi via 10000 stochastic trials
{"pi_estimate": 3.126, "iterations": 10000, "successes": 7815}
Monitoring-2: Basic monitoring example
Using our standard pi.py
example,
Create a slurm script that runs the algorithm with 100000000 (\(10^8\)) iterations. Submit it to the queue and use
slurm queue
,slurm history
andseff
to monitor the job’s performance.Add multiple job steps (separate
srun
lines), each of which runs the algorithmpi.py
with increasing number of iterations (from range 100 - 10000000 (\(10^7\)). How does this appear inslurm history
?
Monitoring-3: Using seff
Continuing from the example above,
Use
seff
to check performance of individual job steps. Can you explain why the CPU utilization numbers change between steps?
This is really one of the most important take-aways from this lesson.
Solution
Using seff JOBID.STEPID
allows you to check efficiency of specific steps.
You should see that steps with low number of iterations had very low cpu
efficiency, while higher amount of iterations had better efficiency.
The important thing to note here is that each srun step has to finish before next one can start. This means if you have steps with different resource requirements in one job, lot of the resources you requested will be going to waste.
Monitoring-4: Multiple processors
The script pi.py
has been written so that it can be run using
multiple processors. Run the script with multiple processors and
\(10^8\) iterations with:
$ srun --cpus-per-task=2 python pi.py --nprocs=2 100000000
After you have run the script, do the following:
Use
slurm history
to check theTotalCPUTime
andWallTime
. Compare them to the timings for the single CPU run with \(10^8\) iterations.Use
seff
to check CPU performance of the job.
Monitoring-5: No output
You submit a job, and it should be writing some stuff to the output. But nothing is appearing in the output file. What’s wrong?
Solution
If it’s only been a few minutes, output is probably still buffered. This happens to avoid writing to disk for every line, which would otherwise slow down a program a lot.
FYI, interactive programs are usually line-buffered (display to terminal after each line) and non-interactive programs usually fully buffered (output after every few kB.) Search “[language] flush buffers” to see how to force it to write sooner - but remove this after debugging!
What’s next?
Next tutorial is about different ways of doing parallel computing.
Parallel computing: different methods explained
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Parallel computing is what HPC is really all about: processing things on more than one processor at once. By now, you should have read all of the previous tutorials.
Abstract
You need to figure out what parallelization paradigm your program uses, otherwise you won’t know which options to use.
Embarrassingly parallel: use options for array jobs.
Multithreaded (OpenMP) or multiple processes (like Python’s multiprocessing): use options for shared memory parallelism.
MPI: use options for MPI parallelism.
GPU: use options for GPUs.
You must always monitor jobs to make sure they are using all the resources you request (
seff JOBID
).If you aren’t fully sure of how to scale up, contact us Research Software Engineers early.

We are working to get access to the login node. This is the gateway to all the rest of the cluster.
Parallel programming models
Parallel programming is used to create programs that can execute instructions on multiple processors at a same time. Most of our users that run their programs in parallel utilize existing parallel execution features that are present in their programs and thus do not need to learn how to create parallel programs. But even when one is running programs in parallel, it is important to understand different models of parallel execution.
The main models of parallel programming are:
Embarrassingly parallel problem can be split into completely independent jobs that can be executed separately with no communication between individual jobs.
More often than not, scientific problems involve running a single program again and again with different datasets or parameters. Slurm has a structure called job array, which enables users to easily submit a large amount of such jobs.
Any program can be run in an embarassingly parallel way as long as the problem at hand can be split into multiple independent jobs.
Each job in an array is identical to every other job, but each independent job gets its own unique ID.
Workloads that utilize this model should request what a single job needs and the number of array jobs that the whole array should have.
See: array jobs.
The array job runs independently across the cluster.
Shared memory (or multithreaded/multiprocess) parallel programs run multiple processes / threads on the same machine. As the name suggests, all of the computer’s memory has to be accessible to all of the processes / threads.
Thus programs that utilize this model should request one node, one task and multiple CPUs.
Example applications that utilize this model: Matlab (internally & parallel pool), R (internally & parallel-library), Python (numpy internally & threading/multiprocessing-modules), OpenMP applications, BLAS libraries, FFTW libraries, typical multithreaded/multiprocess parallel desktop programs.
See: shared-memory parallelism.
The shared memory job runs across one node - since that’s what shares memory.
MPI parallelism utilizes MPI (Message Passing Interface) libraries for communication between MPI tasks. These MPI tasks work in a collective fashion and each task executes its part of the same program.
Communication between MPI tasks is passed through the high-speed interconnects between different compute nodes and this allows for programs that can tuilize thousands of CPU cores.
Almost all large-scale scientific programs utilize MPI. MPI programs are usually quite complex and written for a specific use case as the nature of the collective operations depends on the problem at hand.
Programs that utilize this model should request single/multiple nodes with multiple tasks each. You should not request multiple CPUs per task.
Example applications that utilize this model: CP2K, GPAW, LAMMPS, OpenFoam. See: MPI parallelism.
The MPI job can communicate across nodes.
Parallel execution in GPUs is not parallel in the traditional sense where multiple CPUs run different processes. Instead GPU parallelism leverages GPGPUs (general-purpose graphics processing units) that have thousands of compute cores inside them. When running suitable problems GPUs can be substantially faster than CPUs.
Programs that utilize GPUs are written in parts where some part of the program executes on the CPU and other is executed on the GPU. The part that runs on the CPU usually does things like reading input and writing output, while the GPU part is more focused on doing numerical calculations. Often multiple CPUs are needed per GPU to do things such as data preprocessing just to keep the GPU preoccupied.
A typical CPU program cannot utilize GPUs unless it has been designed to use them. Additionally programs that utilize GPUs cannot utilize multiple GPUs unless they have been designed for it.
Programs that utilize GPUs should request a single node, a single task, (optionally) multiple CPUs and a GPU.
See: GPU computing.
Does my code parallelize?
Normal serial code can’t just be run in parallel without modifications. As a user it is your responsibility to understand what parallel model implementation your code has, if any.
When deciding whether using parallel programming is worth the effort, one should be mindful of Amdahl’s law and Gustafson’s law. All programs have some parts that can only be executed in serial fashion and thus speedup that one can get from using parallel execution depends on how much of programs’ execution can be done in parallel.
Thus if your program runs mainly in serial but has a small parallel part, running it in parallel might not be worth it. Sometimes, doing data parallelism with e.g. array jobs is much more fruitful approach.
Another important note regarding parallelism is that all the applications scale good up to some upper limit which depends on application implementation, size and type of problem you solve and some other factors. The best practice is to benchmark your code on different number of CPU cores before you start actual production runs.
If you want to run some program in parallel, you have to know something about it - is it shared memory or MPI? A program doesn’t magically get faster when you ask more processors if it’s not designed to.
Combining different parallel execution models
Different parallel execution models can be combined if your program supports them. Below a few common situations are listed:
Embarassingly parallel everything
As running programs in an embarassingly parallel fashion is not a feature of the program, but a feature of the workflow itself, any program can be run in an embarassingly parallel fashion if needed.
One can run shared-memory parallel, MPI parallel and GPU parallel jobs in array jobs as well. Each individual job will get their own resources.
Hybrid parallelism
When MPI and shared memory parallelism are done by the same application it is usually called hybrid parallelization. Programs that utilize this model can require both multiple tasks and multiple CPUs per task.
For example, CP2K compiled to psmp
-target has hybrid parallelization enabled
while popt
-target has only MPI parallelization enabled. The best ratio between
MPI tasks and CPUs per tasks depends on the program and needs to be measured.
Multi-node parallelism without MPI
Some programs can run with multiple nodes in parallel, but they do not use MPI for communication between nodes. Resources for these programs are reserved in a similar fashion to the MPI programs, but the program launch is usually done by scripts that run different instructions on different machines. The setup depends on the program and can be complex.
See also
The Research Software Engineers can help in all aspects of parallel computing - we’d recommend anyone getting to this point set up a consultation to make sure your work is as efficient as it can be.
What’s next?
The next tutorial is about array jobs.
Array jobs: embarassingly parallel execution
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Abstract
Arrays allow you to submit jobs and it runs many times with the same Slurm parameters.
Submit with the
--array=
Slurm argument, give array indexes like--array=1-10,12-15
.The
$SLURM_ARRAY_TASK_ID
environment variable tells a job which array index it is.There are different templates to use below, which you can adapt to your task.
If you aren’t fully sure of how to scale up, contact us Research Software Engineers early.
More often than not, scientific problems involve running a single program again and again with different datasets or parameters.
When there is no dependency or communication among the individual program runs, these individual runs can be run in parallel on separate Slurm jobs. This kind of parallelism is called embarassingly parallel.
Slurm has a structure called job array, which enables users to easily submit and run several instances of the same Slurm script independently in the queue.

Array jobs let you control a large amount of the cluster. In Parallel computing: different methods explained, we will see another way.
Introduction
Array jobs allow you to parallelize your computations. They are used when you need to run the same job many times with only slight changes among the jobs. For example, you need to run 1000 jobs each with a different seed value for the random number generator. Or perhaps you need to apply the same computation to a collection of data sets. These can be done by submitting a single array job.
A Slurm job array is a collection of jobs that are to be executed with identical
parameters. This means that there is one single batch script that is to be run
as many times as indicated by the --array
directive, e.g.:
#SBATCH --array=0-4
creates an array of 5 jobs (tasks) with index values 0, 1, 2, 3, 4.
The array tasks are copies of the submitted batch script that are automatically submitted
to Slurm. Slurm provides a unique environment variable SLURM_ARRAY_TASK_ID
to each
task which could be used for handling input/output files to each task.
--array
via the command line
You can also pass the --array
option as a command-line argument to
sbatch
. This can be great for controlling things without
editing the script file.
Important
When running array job you’re basically running identical copies of a single job. Thus it is increasingly important to know how your code behaves with respect to the file system:
Does it use libraries/environment stored in the work directory?
How much input data does it need?
How much output data does the job create?
For example, running an array job with hundreds of workers that uses a Python environment stored in the work disk can inadvertently cause a lot of filesystem load as there will be hundreds of thousands of file calls.
If you’re unsure how your job will behave, ask us Research Software Engineers for help for help.
Your first array job
Let’s see a job array in action. Lets create a file called
array_example.sh
and write it as follows.
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=200M
#SBATCH --output=array_example_%A_%a.out
#SBATCH --array=0-15
# You may put the commands below:
# Job step
srun echo "I am array task number" $SLURM_ARRAY_TASK_ID
Submitting the job script to Slurm with sbatch array_example.sh
, you will get the message:
Submitted batch job 60997836
The job id in the message is that of the primary array job. This is common for all of the jobs in the array. In addition, each individual job is given an array task id.
As now we’re submitting multiple jobs simultaneously, each job needs an
individual output file or the outputs will overwrite each other. By default,
Slurm will write the outputs to files named
slurm-${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}.out
. This can be overwritten using the
--output=FILENAME
-parameter, when you can use wildcard %A
for the
job id and %a
for the array task id.
Once the jobs are completed, the output files will be created in your work directory,
with the help %u
to determine your user name:
$ ls
array_example_60997836_0.out array_example_60997836_12.out array_example_60997836_15.out array_example_60997836_3.out array_example_60997836_6.out array_example_60997836_9.out
array_example_60997836_10.out array_example_60997836_13.out array_example_60997836_1.out array_example_60997836_4.out array_example_60997836_7.out array_example.sh
array_example_60997836_11.out array_example_60997836_14.out array_example_60997836_2.out array_example_60997836_5.out array_example_60997836_8.out
You can cat
one of the files to see the output of each task:
$ cat array_example_60997836_11.out
I am array task number 11
Important
The array indices do not need to be sequential. For example, if after
running an array job you find out that tasks 2 and 5 failed, you can
relaunch just those jobs with --array=2,5
.
You can even simply pass the --array
option as a command-line argument to
sbatch
.
More examples
The following examples give you an idea on how to use job arrays for different
use cases and how to utilize the $SLURM_ARRAY_TASK_ID
environment
variable. In general,
You need some map of numbers to configuration. This might be files on the filesystem, a hardcoded mapping in your code, or some configuration file.
You generally want the mapping to not get lost. Be careful about running some jobs, changing the mapping, and running more: you might end up with a mess!
Reading input files
In many cases, you would like to process several data files. That is, pass
different input files to your code to be processed. This can be achieved by
using $SLURM_ARRAY_TASK_ID
environment variable.
In the example below, the array job gives the program different input files,
based on the value of the $SLURM_ARRAY_TASK_ID
:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=1G
#SBATCH --array=0-29
# Each array task runs the same program, but with a different input file.
srun ./my_application -input input_data_${SLURM_ARRAY_TASK_ID}
Hardcoding arguments in the batch script
One way to pass arguments to your code is by hardcoding them in the batch script you want to submit to Slurm.
Assume you would like to run the pi estimation code for 5 different seed values, each
for 2.5 million iterations. You could assign a seed value to each task in you job array
and save each output to a file. Having calculated all estimations, you could take the
average of all the pi values to arrive at a more accurate estimate. An example of such
a batch script
pi_array_hardcoded.sh
is as follows.
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-hardcoded
#SBATCH --output=pi-array-hardcoded_%a.out
#SBATCH --array=0-4
case $SLURM_ARRAY_TASK_ID in
0) SEED=123 ;;
1) SEED=38 ;;
2) SEED=22 ;;
3) SEED=60 ;;
4) SEED=432 ;;
esac
srun python slurm/pi.py 2500000 --seed=$SEED > pi_$SEED.json
Save the script and submit it to Slurm:
$ sbatch pi_array_hardcoded.sh
Submitted batch job 60997871
Once finished, 5 Slurm output files and 5 application output files will be created in your current directory each containing the pi estimation; total number of iterations (sum of iteration per task); and total number of successes):
$ cat pi_22.json
{"successes": 1963163, "pi_estimate": 3.1410608, "iterations": 2500000}
Reading parameters from one file
Another way to pass arguments to your code via script is to save the arguments to a file and have your script read the arguments from it.
Drawing on the previous example, let’s assume you now want to run pi.py
with different iterations. You can create a file, say iterations.txt
and have all the values written to it, e.g.:
$ cat iterations.txt
100
1000
50000
1000000
You can modify the previous script to have it read the iterations.txt
one line at a time and pass it on to pi.py
. Here, sed
is used
to get each line. Alternatively you can use any other command-line
utility, e.g. awk
. Do not worry if you don’t know how sed
works
- Google search and man sed
always help. Also note that the line numbers
start at 1, not 0.
The script
pi_array_parameter.sh
looks like this:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-parameter
#SBATCH --output=pi-array-parameter_%a.out
#SBATCH --array=1-4
n=$SLURM_ARRAY_TASK_ID
iteration=`sed -n "${n} p" iterations.txt` # Get n-th line (1-indexed) of the file
srun python slurm/pi.py ${iteration} > pi_iter_${iteration}.json
You can additionally do this procedure in a more complex way, e.g. read in multiple arguments from a csv file, etc.
(Advanced) Two-dimensional array scanning
What if you wanted an array job that scanned a 2D array of points?
Well, you can map 1D to 2D via the following pseudo-code: x =
TASK_ID // N
(floor division) and y = TASK_ID % N
(modulo
operation). Then map these numbers into your grid. This can be
done in bash, but at this point you’d want to start thinking about
passing the SLURM_ARRAY_TASK_ID
variable into your code itself for
this processing.
(Advanced) Grouping runs together in bigger chunks
If you have lots of jobs that are short (a few minutes), using array jobs may induce too much overhead in scheduling and you will create huge number of output files. In these kinds of cases you might want to combine multiple program runs into a single array job.
Important
A good target time for the array jobs would be approximately 30 minutes, so please try to combine your tasks so that each job would at least take this long.
Easy workaround for this is to create a for-loop in your Slurm script. For example, if you want to run the pi script with 50 different seed values you could run them in chunks of 10 and run a total of 5 array jobs. This reduces the amount of array jobs we need by a factor of 10!
This method demands more knowledge of shell scripting, but the end result is a
fairly simple Slurm script
pi_array_parameter.sh
that does what we need.
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=500M
#SBATCH --job-name=pi-array-grouped
#SBATCH --output=pi-array-grouped_%a.out
#SBATCH --array=1-4
# Lets create a new folder for our output files
mkdir -p json_files
CHUNKSIZE=10
n=$SLURM_ARRAY_TASK_ID
indexes=`seq $((n*CHUNKSIZE)) $(((n + 1)*CHUNKSIZE - 1))`
for i in $indexes
do
srun python slurm/pi.py 1500000 --seed=$i > json_files/pi_$i.json
done
Exercises
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
Array-1: Array jobs and different random seeds
Create a job array that uses the slurm/pi.py
to calculate a
combination of different iterations and seed values and save them
all to different files. Keep the standard output (#SBATCH
--output=FILE
) separate from the standard error (#SBATCH
--error=FILE
).
Array-2: Combine the outputs of the previous exercise.
You find an pi-aggregate.py
program in hpc-examples. Run this
and give all the output files as arguments. It will combine all
the statistics and give a more accurate value of \(\pi\).
Array-3: Reflect on array jobs in your work
Think about your typical work. How could you split your stuff into trivial pieces that can be run with array jobs? When can you make individual jobs smaller, but run more of them as array jobs?
(Advanced) Array-4: Array jobs with advanced index selector
Make a job array which runs every other index, e.g. the array can be indexed as 1, 3, 5… (the sbatch manual page can be of help)
Solution
You can specify a step function with colon and a number after indices.
In this case it would be: --array=1-X:2
See also
If you aren’t fully sure of how to scale up, contact us Research Software Engineers early. We are great at making these types of workflows.
For more information, you can see the CSC guide on array jobs
Please check the quick reference when needed.
What’s next?
The next tutorial is about shared memory parallelism.
MPI parallelism: multi-task programs
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Abstract
Verify that your program can use MPI.
Compile to link with our MPI libraries. Remember to load the same modules in your Slurm script.
Use
--nodes=1
and--ntasks=n
to reserve \(n\) tasks for your job.Start your application via
srun
if using module installed MPI ormpirun
if you have your own installation of MPI.For spreading tasks evenly across nodes, use
--nodes=N
and--ntasks-per-node=n
for getting \(N \cdot n\) tasks.You must always monitor jobs to make sure they are using all the resources you request (
seff JOBID
).If you aren’t fully sure of how to scale up, contact us Research Software Engineers early.

MPI parallelism lets you scale to many nodes on the cluster, at the cost of extra programming work.
What is MPI parallelism?
MPI or message passing interface is a standard for creating communication between many tasks that collectively run a program in parallel. Programs using MPI can scale up to thousands of nodes.
Programs using MPI need to be written so that they utilize the MPI communication. Thus typical programs that are not written around MPI cannot use MPI without major modifications.
MPI programs typically work in the following way:
Same program is started in multiple separate tasks.
All tasks join a communication layer with MPI.
Each tasks gets their own rank (basically and ID number).
Based on their ranks tasks execute their part of the code and communicate to other tasks. Rank 0 is usually the “main program” that prints output for monitoring.
After the program finishes communication layer is stopped.
When using module installed installations of MPI the MPI ranks will automatically get information on their ranks from Slurm via library called PMIx. If the MPI used is some other version, they might not connect with the Slurm correctly.
Running a typical MPI program
Compiling a MPI program
For compiling/running an MPI job one has to pick up one of the MPI library
suites. There are various different MPI libraries that all implement the
MPI standard. We recommend that you use our OpenMPI installation
(openmpi/4.1.5
). For information on other installed versions, see the
MPI applications page.
Some libraries/programs might have already existing requirement for a certain MPI version. If so, use that version or ask for administrators to create a version of the library with dependency on the MPI version you require.
Warning
Different versions of MPI are not compatible with each other. Each version of MPI will create code that will run correctly with only that version of MPI. Thus if you create code with a certain version, you will need to load the same version of the library when you are running the code.
Also, the MPI libraries are usually linked to slurm and network drivers. Thus, when slurm or driver versions are updated, some older versions of MPI might break. If you’re still using said versions, let us know. If you’re just starting a new project, it is recommended to use our recommended MPI libraries.
Reserving resources for MPI programs
For basic use of MPI programs, you will need to use the
--nodes=1
and --ntasks=N
-options to specify the number of MPI workers.
The --nodes=1
option is recommended so that your jobs will run in the
same machine for maximum communication efficiency. You can also run
without it, but this can result in worse performance.
In many cases you might require more tasks than one node has CPUs.
When this is the case, it is recommended to split the number of
workers evenly among the nodes. To do this, one can use
--nodes=N
and --ntasks-per-node=n
. This would give you
\(N \cdot n\) tasks in total.
Each task will get a default of 1 CPU. See section on hybrid parallelisation for information on whether you can give each task more than one CPU.
Running and example MPI program
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
For this example, let’s consider
pi-mpi.c-example
in the slurm
-folder.
It estimates pi with Monte Carlo methods and
can utilize multiple MPI tasks for calculating the trials.
First off, we need to compile the program with a suitable OpenMPI version. Let’s use the
recommended version openmpi/4.1.5
:
$ module load openmpi/4.1.5
$ mpicc -o pi-mpi pi-mpi.c
The program can now be run with srun ./pi-mpi N
, where N
is the number of
iterations to be done by the algorithm.
Let’s ask for resources and run the program with two processes using srun
:
$ srun --nodes=1 --ntasks=2 --time=00:10:00 --mem=500M ./pi-mpi 1000000
This worked because we had the correct modules already loaded. Using a slurm script setting the requirements and loading the correct modules becomes easier:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --mem=500M
#SBATCH --output=pi.out
#SBATCH --nodes=1
#SBATCH --ntasks=2
module load openmpi/4.1.5
srun ./pi-mpi 1000000
Let’s call this script pi-mpi.sh
. You can submit it with:
$ sbatch pi-mpi.sh
Special cases and common pitfalls
MPI workers do not see each other
When using our installations of MPI the MPI ranks will automatically get information on their ranks from Slurm via library called PMIx. If the MPI used is some other version, they might not connect with the Slurm correctly.
If you have your own installation of MPI you might try setting
export SLURM_MPI_TYPE=pmix_v2
in your job before calling srun
.
This will tell Slurm to use PMIx for connecting with the MPI installation.
Setting a constraint for a specific CPU architecture
The number of CPUs/tasks one can specify for a single parallel job depends usually on the underlying algorithm. In many codes, such as many finite-difference codes, the workers are set in a grid-like structure. The user of said codes has then a choice of choosing the dimensions of the simulation grid aka. how many workers are in x-, y-, and z-dimensions.
For best perfomance one should reserve half or full nodes when possible. In heterogeneous clusters this can be a bit more complicated as different CPUs can have different numbers of cores.
In Triton CPU partitions there are machines with 24, 28 and 40 CPUs. See the list of available nodes for more information.
However, one can make the reservations easier by specifying a CPU architecture
with --constraint=ARCHITECTURE
. This tells Slurm to look for nodes that
satisfy a specific feature. To list available features, one can use
slurm features
.
For example, one could limit the code to the Haswell-architecture with the following script:
#!/bin/bash
#SBATCH --time=00:10:00 # takes 5 minutes all together
#SBATCH --mem-per-cpu=200M # 200MB per process
#SBATCH --nodes=1 # 1 node
#SBATCH --ntasks-per-node=24 # 24 processes as that is the number in the machine
#SBATCH --constraint=hsw # set constraint for processor architecture
module load openmpi/4.1.5 # NOTE: should be the same as you used to compile the code
srun ./pi-mpi 1000000
Monitoring performance
You can use seff JOBID
to see what percent of available CPUs and RAM was
utilized. Example output is given below:
$ seff 60985042
Job ID: 60985042
Cluster: triton
User/Group: tuomiss1/tuomiss1
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 00:00:29
CPU Efficiency: 90.62% of 00:00:32 core-walltime
Job Wall-clock time: 00:00:16
Memory Utilized: 1.59 MB
Memory Efficiency: 0.08% of 2.00 GB
If your processor usage is far below 100%, your code may not be working correctly. If your memory usage is far below 100% or above 100%, you might have a problem with your RAM requirements. You should set the RAM limit to be a bit above the RAM that you have utilized.
You can also monitor individual job steps by calling seff
with the syntax
seff JOBID.JOBSTEP
.
Important
When making job reservations it is important to distinguish
between requirements for the whole job (such as --mem
) and
requirements for each individual task/cpu (such as --mem-per-cpu
).
E.g. requesting --mem-per-cpu=2G
with --ntasks=2
and --cpus-per-task=4
will create a total memory reservation of
(2 tasks)*(4 cpus / task)*(2GB / cpu)=16GB.
Hybrid parallelization aka. giving more than one CPU to each MPI task
When MPI and shared memory parallelism are done by the same application it is usually called hybrid parallelization. Programs that utilize this model can require both multiple tasks and multiple CPUs per task.
For example, CP2K compiled to psmp
-target has hybrid parallelization enabled
while popt
-target has only MPI parallelization enabled. The best ratio between
MPI tasks and CPUs per tasks depends on the program and needs to be measured.
Remember that the number of CPUs in a machine is hardware dependent.
The total number of CPUs per node when you request --ntasks-per-node=n
and
--cpus-per-task=C
is \(n \cdot C\). This number needs to be equal or less than
the total number of CPUs in the machine.
Exercises
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
MPI parallelism 3: Your program
Think of your program. Do you think it can use MPI parallelism?
If you do not know, you can check the program’s documentation for words such as:
MPI
message-passing interface
mpirun
mpiexec
…
These usually point towards some method of MPI parallel execution.
What’s next?
The next tutorial is about GPU parallelism.
GPU computing
Videos
Videos of this topic may be available from one of our kickstart course playlists: 2023, 2022 Summer, 2022 February, 2021 Summer, 2021 February.
Abstract
Request a GPU with the Slurm option
--gres=gpu:1
or--gpus=1
(some clusters need-p gpu
or similar).Select a certain type of GPU with e.g.
--constraint='volta'
(see the quick reference for names).Monitor GPU performance with
sacct -j JOBID -o comment -p
.For development, run jobs of 4 hours or less, and they can run quickly in the
gpushort
queue.If you aren’t fully sure of how to scale up, contact us Research Software Engineers early.

GPU nodes allow specialized types of work to be done massively in parallel.
What are GPUs and how do they parallelise calculations?
GPUs, short for graphical processing unit, are massively-parallel processors that are optimized to perform numerical calculations in parallel. Due to this specialisation GPUs can be substantially faster than CPUs when solving suitable problems.
GPUs are especially handy when dealing with matrices and vectors. This has allowed GPUs to become an indispensable tool in many research fields such as deep learning, where most of the calculations involve matrices.
The programs we normally write in common programming languages, e.g. C++ are executed by the CPU. To run a part of that program in a GPU the program must do the following:
Specify a piece of code called a kernel, which contains the GPU part of the program and is compiled for the specific GPU architecture in use.
Transfer the data needed by the program from the RAM to GPU VRAM.
Execute the kernel on the GPU.
Transfer the results from GPU VRAM to RAM.
To help with this procedure special APIs (application programming interfaces) have been created. An example of such an API is CUDA toolkit, which is the native programming interface for NVIDIA GPUs.
On Triton, we have a large number of NVIDIA GPU cards from different generations and a single machine with AMD GPU cards. Triton GPUs are not the typical desktop GPUs, but specialized research-grade server GPUs with large memory, high bandwidth and specialized instructions. For scientific purposes, they generally outperform the best desktop GPUs.
See also
Please ensure you have read Interactive jobs and Serial Jobs before you proceed with this tutorial.
Running a typical GPU program
Reserving resources for GPU programs
Slurm keeps track of the GPU resources as generic resources (GRES) or trackable resources (TRES). They are basically limited resources that you can request in addition to normal resources such as CPUs and RAM.
To request GPUs on Slurm, you should use the --gres=gpu:1
or --gpus=1
-flags.
You can also use syntax --gres=gpu:GPU_TYPE:1
, where GPU_TYPE
is a name chosen by the admins for the GPU. For example, --gres=gpu:v100:1
would give you a V100 card. See section on
reserving specific GPU architectures for more information.
You can request more than one GPU with --gres=gpu:G
, where G
is
the number of the requested GPUs.
Some GPUs are placed in a quick debugging queue. See section on reserving quick debugging resources for more information.
Note
Most GPU programs cannot utilize more than one GPU at a time. Before trying to reserve multiple GPUs you should verify that your code can utilize them.
Running an example program that utilizes GPU
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
For this example, let’s consider
pi-gpu.cu
in the slurm
-folder.
It estimates pi with Monte Carlo methods and can utilize a GPU for calculating
the trials.
The script is in the slurm
-folder. This example is written in C++ and CUDA.
Thus it needs to be compiled before it can be run.
To compile CUDA-based code for GPUs, lets load a cuda
-module and
a newer compiler:
module load gcc/8.4.0 cuda
Now we should have a compiler and a CUDA toolkit loaded. After this we can compile the code with:
nvcc -arch=sm_60 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -o pi-gpu pi-gpu.cu
This monstrosity of a command is written like this because we want our code to be able run on multiple different GPU architectures. For more information, see section on setting compilation flags for GPU architectures.
Now we can run the program using srun
:
srun --time=00:10:00 --mem=500M --gres=gpu:1 ./pi-gpu 1000000
This worked because we had the correct modules already loaded. Using a slurm script setting the requirements and loading the correct modules becomes easier:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --mem=500M
#SBATCH --output=pi-gpu.out
#SBATCH --gres=gpu:1
module load gcc/8.4.0 cuda
./pi-gpu 1000000
Note
If you encounter problems with CUDA libraries, see the section on missing CUDA libraries.
Special cases and common pitfalls
Monitoring efficient use of GPUs
When running a GPU job, you should check that the GPU is being fully utilized.
When your job has started, you can ssh
to the node and run
nvidia-smi
. It should be close to 100%.
Once the job has finished, you can use slurm history
to obtain the
jobID
and run:
$ sacct -j JOBID -o comment -p
{"gpu_util": 99.0, "gpu_mem_max": 1279.0, "gpu_power": 204.26, "ncpu": 1, "ngpu": 1}|
This also shows the GPU utilization.
If the GPU utilization of your job is low, you should check whether
its CPU utilization is close to 100% with seff JOBID
. Having a high
CPU utilization and a low GPU utilization can indicate that the CPUs are
trying to keep the GPU occupied with calculations, but the workload
is too much for the CPUs and thus GPUs are not constantly working.
Increasing the number of CPUs you request can help, especially in tasks that involve data loading or preprocessing, but your program must know how to utilize the CPUs.
However, you shouldn’t request too many CPUs: There wouldn’t be enough CPUs for everyone to use the GPUs and they would go to waste (all of our nodes have 4-12 CPUs for each GPU).
Reserving specific GPU types
You can restrict yourself to a certain type of GPU card by using
using the --constraint
option. For example, to restrict the submission to
Pascal generation GPUs only you can use --constraint='pascal'
.
For choosing between multiple generations, you can use the |
-character
between generations. For example, if you want to restrict the submission
Volta or Ampere generations you can use --constraint='volta|ampere'
.
Remember to use the quotes since |
is the shell pipe.
To see what GPU resources are available, run slurm features
or
sinfo -o '%50N %18F %26f %30G'
.
Alternative way is to use syntax --gres=gpu:GPU_TYPE:1
, where GPU_TYPE
is a name chosen by the admins for the GPU. For example, --gres=gpu:v100:1
would give you a V100 card.
Reserving resources from the short job queue for quick debugging
There is a gpushort
partition with a time limit of 4 hours that
often has space (like with other partitions, this is automatically
selected for short jobs). As of early 2022, it has four Tesla P100
cards in it (view with slurm partitions | grep gpushort
). If you
are doing testing and development and these GPUs meet your needs, you
may be able to test much faster here. Use -p gpushort
for this.
CUDA libraries not found
If you ever get libcuda.so.1: cannot open shared object file: No such
file or directory
, this means you are attempting to use a CUDA
program on a node without a GPU. This especially happens if you try
to test a GPU code on the login node.
Another problem that might occur is when a program will try to use pre-compiled kernels, but the corresponding CUDA toolkit is not available.
This might happen in you have used a cuda
-module to compile
the code and it is not loaded when you try to run the code.
If you’re using Python, see the section on CUDA libraries and Python.
CUDA libraries and Python deep learning frameworks
When using a Python deep learning frameworks such as Tensorflow or PyTorch you usually need to create a conda environment that contains both the framework and CUDA framework that the framework needs.
We recommend that you either use our centrally installed module that contains both frameworks (more info here) or install your own using environment using instructions presented here. These instructions make certain that the installed framework has a corresponding CUDA toolkit available. See the application list for more details on specific frameworks.
Please note that pre-installed software either has CUDA already present or it loads the needed modules. Thus you do not need to explicitly load CUDA from the module system when loading these.
Setting CUDA architecture flags when compiling GPU codes
Many GPU codes come with precompiled kernels, but in some cases you might need to compile your own kernels. When this is the case you’ll want to give the compiler flags that make it possible to run the code on multiple different GPU architectures.
For GPUs in Triton these flags are:
-arch=sm_60 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80
Here architectures (compute_XX
/sm_XX
) number 60, 70 and 80
correspond to GPU cards P100, V100 and A100 respectively.
For more information, you can check this excellent article or CUDA documentation on the subject.
Keeping GPUs occupied when doing deep learning
Many problems such as deep learning training are data-hungry. If you are loading large amounts of data you should make certain that the data loading is done in an efficient manner or the GPU will not be fully utilized.
All deep learning frameworks have their own guides on how to optimize the data loading, but they all are some variation of:
Store your data in multiple big files.
Create code that loads data from these big files.
Run optional pre-processing functions on the data.
Create a batch of data out of individual data samples.
Tasks 2 and 3 are usually parallelized across multiple CPUs. Using pipelines such as these can dramatically speed up the training procedure.
If your data consists of individual files that are not too big,
it is a good idea to have the data stored in one file, which is then
copied to nodes ramdisk /dev/shm
or temporary disk /tmp
.
Avoiding small files is in general a good rule to follow. Please refer to the small files page for more detailed information.
If your data is too big to fit in the disk, we recommend that you contact us for efficient data handling models.
For more information on suggested data loading procedures for different frameworks, see Tensorflow’s and PyTorch’s guides on efficient data loading.
Profiling GPU usage with nvprof
When using NVIDIA’s GPUs you can try to use a profiling tool
called nvprof
to monitor what took most of the GPU’s
time during the code’s execution.
Sample output might look something like this:
==30251== NVPROF is profiling process 30251, command: ./pi-gpu 1000000000
==30251== Profiling application: ./pi-gpu 1000000000
==30251== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 84.82% 11.442ms 1 11.442ms 11.442ms 11.442ms throw_dart(curandStateXORWOW*, int*, unsigned long*)
14.70% 1.9833ms 1 1.9833ms 1.9833ms 1.9833ms setup_rng(curandStateXORWOW*, unsigned long)
0.30% 40.704us 1 40.704us 40.704us 40.704us [CUDA memcpy DtoH]
0.17% 23.328us 1 23.328us 23.328us 23.328us [CUDA memcpy HtoD]
API calls: 89.52% 122.81ms 3 40.936ms 3.6360us 122.70ms cudaMalloc
10.05% 13.794ms 2 6.8969ms 68.246us 13.726ms cudaMemcpy
0.20% 269.55us 3 89.851us 11.283us 130.45us cudaFree
0.14% 196.08us 101 1.9410us 122ns 83.854us cuDeviceGetAttribute
0.04% 57.228us 2 28.614us 6.3760us 50.852us cudaLaunchKernel
0.02% 32.426us 1 32.426us 32.426us 32.426us cuDeviceGetName
0.01% 13.677us 1 13.677us 13.677us 13.677us cuDeviceGetPCIBusId
0.01% 10.998us 1 10.998us 10.998us 10.998us cudaGetDevice
0.00% 2.3540us 1 2.3540us 2.3540us 2.3540us cudaGetDeviceCount
0.00% 1.2690us 3 423ns 207ns 850ns cuDeviceGetCount
0.00% 663ns 2 331ns 170ns 493ns cuDeviceGet
0.00% 656ns 1 656ns 656ns 656ns cuDeviceTotalMem
0.00% 396ns 1 396ns 396ns 396ns cuModuleGetLoadingMode
0.00% 234ns 1 234ns 234ns 234ns cuDeviceGetUuid
This output shows that most of the computing time was caused by calling the
throw_dart
-kernel. It is important to note that in this example memory
allocation cudaMalloc
and memory copying cudaMemcpy
used more time than
the actual computation. Memory operations are time consuming operations and thus
best codes try to minimize the need for doing them.
To see a chronological order of different GPU operations one can also run
nprof --print-gpu-trace
. The output will look something like this:
==31050== NVPROF is profiling process 31050, command: ./pi-gpu 1000000000
==31050== Profiling application: ./pi-gpu 1000000000
==31050== Profiling result:
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput SrcMemType DstMemType Device Context Stream Name
182.84ms 23.136us - - - - - 256.00KB 10.552GB/s Pageable Device Tesla P100-PCIE 1 7 [CUDA memcpy HtoD]
182.89ms 1.9769ms (512 1 1) (128 1 1) 31 0B 0B - - - - Tesla P100-PCIE 1 7 setup_rng(curandStateXORWOW*, unsigned long) [118]
184.87ms 11.450ms (512 1 1) (128 1 1) 19 0B 0B - - - - Tesla P100-PCIE 1 7 throw_dart(curandStateXORWOW*, int*, unsigned long*) [119]
196.33ms 40.704us - - - - - 512.00KB 11.996GB/s Device Pageable Tesla P100-PCIE 1 7 [CUDA memcpy DtoH]
Regs: Number of registers used per CUDA thread. This number includes registers used internally by the CUDA driver and/or tools and can be more than what the compiler shows.
SSMem: Static shared memory allocated per CUDA block.
DSMem: Dynamic shared memory allocated per CUDA block.
SrcMemType: The type of source memory accessed by memory operation/copy
DstMemType: The type of destination memory accessed by memory operation/copy
Here we see that the sample code did a memory copy to the device, ran kernel setup_rng
,
ran kernel throw_dart
and did a memory copy back to the host memory.
For more information on nvprof
, see NVIDIA’s documentation on it.
Available GPUs and architectures
Card |
Slurm feature name ( |
Slurm gres name ( |
total amount |
nodes |
architecture |
compute threads per GPU |
memory per card |
CUDA compute capability |
---|---|---|---|---|---|---|---|---|
Tesla K80* |
|
|
12 |
gpu[20-22] |
Kepler |
2x2496 |
2x12GB |
3.7 |
Tesla P100 |
|
|
20 |
gpu[23-27] |
Pascal |
3854 |
16GB |
6.0 |
Tesla V100 |
|
|
40 |
gpu[1-10] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
40 |
gpu[28-37] |
Volta |
5120 |
32GB |
7.0 |
Tesla V100 |
|
|
16 |
dgx[1-7] |
Volta |
5120 |
16GB |
7.0 |
Tesla A100 |
|
|
56 |
gpu[11-17,38-44] |
Ampere |
7936 |
80GB |
8.0 |
AMD MI100 (testing) |
|
Use |
gpuamd[1] |
Exercises
The scripts you need for the following exercises can be found in our
hpc-examples, which
we discussed in Using the cluster from a shell.
You can clone the repository by running
git clone https://github.com/AaltoSciComp/hpc-examples.git
. Doing this
creates you a local copy of the repository in your current working
directory. This repository will be used for most of the tutorial exercises.
GPU 1: Test nvidia-smi
Run nvidia-smi
on a GPU node with srun
. Use slurm history
to check which GPU node you ended up on.
GPU 2: Running the example
Run the example given above with larger number of trials
(10000000000
or \(10^{10}\)).
Try using sbatch
and Slurm script as well.
GPU 3: Run the script and do basic profiling with nvprof
nvprof
is part of NVIDIA’s profiling tools and it can be
used to monitor which parts of the GPU code use up most time.
Run the program as before, but add nvprof
before it.
Try running the program with chronological trace mode
(nvprof --print-gpu-trace
) as well.
Solution
With srun
you can run the profiling as follows:
srun --time=00:10:00 --mem=500M --gres=gpu:1 nvprof ./pi-gpu 10000000000
To get the trace output, you need to add the --print-gpu-trace
-flag:
srun --time=00:10:00 --mem=500M --gres=gpu:1 nvprof --print-gpu-trace ./pi-gpu 10000000000
You should see output similar to ones shown in the section profiling GPU usage with nvprof.
GPU 4: Your program
Think of your program. Do you think it can utilize GPUs?
If you do not know, you can check the program’s documentation for words such as:
GPU
CUDA
ROCm
OpenMP offloading
OpenACC
OpenCL
…
See also
If you aren’t fully sure of how to scale up, contact us Research Software Engineers early.
What’s next?
You have now seen the basics - but applying these in practice is still a difficult challenge! There is plenty to figure out while combining your own software, the Linux environment, and Slurm.
Your time is the most valuable thing you have. If you aren’t fully sure of how to use the tools, it is much better to ask that struggle forever. Contact us the Research Software Engineers early - for example in our daily garage, and we can help you get set up well. Then, you can continue your learning while your projects are progressing.
Job dependencies
Introduction
Job dependencies are a way to specify dependencies between jobs. The most common use is to launch a job only after a previous job has completed successfully. But other kinds of dependencies are also possible.
Basic example
Dependencies are specified with the --dependency=DEPENDENCY_LIST
option. E.g. --dependency=afterok:123:124
means that the job can
only start after job ID’s 123 and 124 have both completed
successfully.
Automating job dependencies
A common problem with job dependencies is that you want job B to start only after job A finishes successfully. However, you cannot know the job ID of job A before it has been submitted. One solution is to catch the job id of job A when submitting it and store it as a shell variable, and using the stored value when submitting job B. Like:
$ idA=$(sbatch jobA.sh | awk '{print $4}')
$ sbatch --dependency=afterok:${idA} jobB.sh
Exercises
Dependencies-1: read the docs
Look at man sbatch
and investigate the --dependency
parameter.
Dependencies-2: Chain of jobs
Create a chain of jobs A -> B -> C each depending on the successful
completion of the previous job. In each job run e.g. sleep 60
to give you time to investigate the status of the queue.
Solution
You should all of your jobs in queue. Jobs with dependency on a previous job will have a status on pending, stating a dependency as the reason.
Dependencies-3: First job fails
Continuing from the previous exercise, what happens if at the end
of the job A script you put exit 1
. What does it mean?
Solution
Putting exit 1
at the end of your job script means it returns
a unix exit code indicating a failure. Next jobs in your depedency
list will sit in queue forever, since as far as they know the previous
job never completed successfully.
Applications
See our general information and the full list below:
Applications: General info
See also
Intro tutorial: Applications (this is assumed knowledge for all software instructions)
When you need software, check the following for instructions (roughly in this order):
This page.
Search the SciComp site using the search function.
Check
module spider
andmodule avail
to see if something is available but undocumented.The issue tracker for other people who have asked - some instructions only live there.
If you have difficulty, it’s usually a good idea to search the issue tracker anyway, in order to learn from the experience of others.
Modules
See Software modules. Modules are the standard way of loading software.
Singularity
See Singularity Containers. Singularity are software containers that provide an operating system within an operating system. Software will tell you if you need to use it via Singularity.
Software installation and policy
We want to support all software, but unfortunately time is limited. In the chart below, we have these categories (which don’t really mean anything, but in the future should help us be more transparent about what we are able to support):
A: Full support and documentation, should always work
B: We install and provide best-effort documentation, but may be out of date.
C: Basic info, no guarantees
If you know some application which is missing from this list but is
widely in use (anyone else than you is using it) it would make sense
install to /share/apps/
directory and create a module file. Send
your request to the tracker. We want to support as much software as
possible, but unfortunately we don’t have the resources to do
everything centrally.
Software is generally easy to install if it is in Spack (check that package list page), a scientific software management and building system. If it has easy-to-install Ubuntu packages, it will be easy to do via singularity.
Software documentation pages
Name |
||
Python |
A |
FHI-aims
FHI-aims (Fritz Haber Institute ab initio molecular simulations package) is an electronic structure theory code package for computational molecular and materials science. FHI-aims density functional theory and many-body perturbation calculations at all-electron, full-potential level.
FHI-aims is licensed software with voluntary payment for an academic license. While the license grants access to the FHI-aims source code each holder of a license can use pre-built binaries available on Triton. To this end, contact Ville Havu at the PHYS department after obtaining the license.
On Triton the most recent version of FHI-aims is available via the
modules FHI-aims/latest-intel-2020.0
that is compiled using the
Intel Parallel Studio and
FHI-aims/latest-OpenMPI-intel-2020.0-scalapack
that is compiled
without any Intel parallel libraries since in rare cases they can
result in spurious segfaults. The binaries are available in
/share/apps/easybuild/software/FHI-aims/<module name>/bin
as
aims.YYMMDD.scalapack.mpi.x
where YYMMDD
indicates the version
stamp.
Notes:
module spider fhi
will show various versions available.The clean Intel version is fastest, but the OpenMPI module is more stable (info as of 2021-07).
FHI-aims is compiled without any Intel parallel libraries since in rare cases, like really big systems, they can result in spurious segfaults.
Search the Triton issue tracker for some more debugging about this.
Running FHI-aims on Triton
To run FHI-aims on Triton a following example batch script can be used:
#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --constraint=avx # FHI-aims build requires at least AVX instrution set
#SBATCH --mem-per-cpu=2000M
#SBATCH --nodes=1
#SBATCH --ntasks=24
ulimit -s unlimited
export OMP_NUM_THREADS=1
module load FHI-aims/latest-intel-2020.0
srun aims.YYMMDD.scalapack.mpi.x
Armadillo
- supportlevel:
C
Armadillo http://arma.sourceforge.net/ is C++ linear algebra library that is needed to support some other software stacks. To get best performance using MKL as backend is adviced.
The challenge is that default installer does not find MKL from non-standard location.
module load mkl
Edit “./build_aux/cmake/Modules/ARMA_FindMKL.cmake” and add MKL path to “PATHS”
Edit “./build_aux/cmake/Modules/ARMA_FindMKL.cmake” and replace mkl_intel_thread with mkl_sequential (we do not want threaded libs on the cluster)
Edit “include/armadillo_bits/config.hpp” and enable ARMA_64BIT_WORD
cmake . && make
make install DESTDIR=/share/apps/armadillo/<version>
Boost
- supportlevel:
C
- pagelastupdated:
2014
Boost is a numerical library needed by some other packages. There is a rpm-package of this in the default SL/RHEL repositories. In case the repository version is too old, a custom compilation is required.
To setup see the manual and follow the few simple steps to bootstrap and compile/install.
https://www.boost.org/doc/libs/1_56_0/more/getting_started/unix-variants.html
COMSOL Multiphysics
Hint
We are continuing the COMSOL focus days in our daily zoom garage in Spring 2024: someone from COMSOL (the company) plans to join our zoom garage at 13:00 on the following Tuesdays: 2024-01-23, 2024-02-27, 2024-03-26, 2024-04-23, 2024-05-28.
Hint
Join the other COMSOL users in our Zulip Chat: Stream “#triton”, topic “Comsol user group”.
To check which versions of Comsol are available, run:
module spider comsol
Comsol in Triton is best run in Batch-mode, i.e. without the graphical userinterface. Prepare your models on your workstation and bring the ready-to-run models to triton. For detailed tutorials from COMSOL, see for example the Comsol Knowledge base articles Running COMSOL® in parallel on clusters and Running parametric sweeps, batch sweeps, and cluster sweeps from the command line. However, various settings must be edited in the graphical user interface.
Best practices of using COMSOL Graphical User Interface in Triton
Connect to triton
Use Open OnDemand for the best experience for interactive work on triton.
Connect to https://ood.triton.aalto.fi with your browser, log in. (It currently takes a while, please be patient.) Choose “My Interactive Sessions” from top bar, and then “Triton Desktop” from bottom. Launch your session, and once resources become available in triton, the session will be started on one of the interactive session nodes of triton. You can connect to a desktop in your browser with the “Launch Triton Desktop” button.
Once you have connected, you can open a terminal (in XFCE the black rectangle in the bottom of the screen).
You can alternatively open a linux session in https://vdi.aalto.fi.
Open a terminal, and connect with ssh to triton login node
ssh -Y triton.aalto.fiHowever, if you use this terminal to start COMSOL, it will be running on the login node, which is a shared resource, and you should be careful not to use too much memory or CPUs.
Start comsol
First make sure you have graphical connection (should print something like “:1.0”)
echo $DISPLAY
then load the comsol module (version of your choice)
module load comsol/6.1
and finally start comsol
comsol
Prerequsities of running COMSOL in Triton
There is a largish but limited pool of floating COMSOL licenses in Aalto University, so please be careful not launch large numbers of comsol processess that each consume a separate license.
Comsol uses a lot of temp file storage, which by default goes to
$HOME
. Fix a bit like the following:$ rm -rf ~/.comsol/ $ mkdir /scratch/work/$USER/comsol_recoveries/ $ ln -sT /scratch/work/$USER/comsol_recoveries/ ~/.comsol
You may need to enable access to the whole filesystem in File|Options –> Preferences –> Security: File system access: “All files”
Enable the “Study -> Batch and Cluster” as well as “Study -> Solver and Job Configurations” nodes in the “Show More Options dialog box you can open by right-clicking the study in the Model Builder Tree.
The cluster settings can be saved in comsol settings, not in the model file. The correct settings are entered in File|Options –> Preferences –> Multicore and Cluster Computing. It is enough to choose Scheduler type: “SLURM”

You can test by loading from the Application Libraries the “cluster_setup_validation” model. The model comes with a documentation -pdf file, which you can open in the Application Libraries dialogue after selecting the model.
COMSOL requires MPICH2 compatible MPI libraries:
$ module purge
$ module load comsol/6.1 intel-parallel-studio/cluster.2020.0-intelmpi
A dictionary of COMSOL HPC lexicon
The knowledge base article Running COMSOL® in parallel on clusters explains the following meanings COMSOL uses:
COMSOL |
SLURM & MPI |
|
---|---|---|
node |
task |
A process, software concept |
host |
node |
A single computer |
core |
cpu |
A single CPU-core |
However, COMSOL does not seem to be using the terms in a 100% consistent way. E.g. sometimes in the SLURM context COMSOL may use node in the SLURM meaning.
An example run in a single node
Use the parameters -clustersimple
and -launcher slurm
. Here is a sample batch-job:
#!/bin/bash
# Ask for e.g. 20 compute cores
#SBATCH --time=10:00:00
#SBATCH --mem-per-cpu=2G
#SBATCH --cpus-per-task=20
cd $WRKDIR/my_comsol_directory
module load Java
module load comsol/6.1
module load intel-parallel-studio/cluster.2020.0-intelmpi
# Details of your input and output files
INPUTFILE=input_model.mph
OUTPUTFILE=output_model.mph
comsol batch -clustersimple -launcher slurm -inputfile $INPUTFILE -outputfile $OUTPUTFILE -tmpdir $TMPDIR
Cluster sweep
If you have a parameter scan to perform, you can use the Cluster sweep node. The whole sweep only needs one license even if comsol launches multiple instances of itself.
First set up the cluster preferences, as described above.
Start by loading the correct modules in triton (COMSOL requires MPICH2 compatible MPI libraries). Then open the graphical user interface to comsol on the login node and open your model.
$ module purge
$ module load comsol/6.1 intel-parallel-studio/cluster.2020.0-intelmpi
$ comsol
Add a “Cluster Sweep” node to your study and a “Cluster Computing” node into your “Job Configurations” (You may need to first enable them in the “Show more options”. Check the various options. You can try solving a small test case from the graphical user interface. You should see COMSOL submitting jobs to the SLURM queue. You can download an example file
.
For a larger run, COMSOL can then submit the jobs with comsol but without the GUI:
$ comsol batch -inputfile your_ready_to_run_model.mph -outputfile output_file.mph -study std1 -mode desktop
See also how to run a parametric sweep from command line?
Since the sweep may take some time to finnish, please consider using tmux or screen to keep your session open.
MATLAB + COMSOL – livelink
It is possible to control COMSOL with MATLAB. The blog post by KnifeLee was useful in preparation of this example.
Save a username and password for COMSOL mph server
Before your first use, you need to save the username and password for COMSOL mph server. On the login node, run:
$ module load comsol/6.1
$ comsol mphserver
And COMSOL will ask for you to choose a username and password. You can close the comsol server with “close”.
Please note, that each instance of the below process uses a COMSOL licence, so this method is not useful for parameter scans.
Example files for batch job workflow
Please check the available versions and installation locations of comsol and update the below scripts accordingly:
module spider comsol module show comsol/6.2
The installation folder is on the line with “prepend_path”.
Here is an example batch submit script comsol_matlab_livelink.sh
:
#!/bin/bash
#SBATCH --time=10:00:00
# Ask for a single node, since the port for connections between COMSOL and MATLAB is by default using port 2036,
# and this is an easy way to avoid clashes between multiple jobs.
#SBATCH --nodes=1
#SBATCH --exclusive
module load matlab
module load comsol/6.1
echo starting comsol server in the background
comsol mphserver &
echo comsol is now running
matlab -nodesktop -nosplash -r "runner;exit(0)"
echo matlab closed
The MATLAB process is running the runner.m
script:
disp('Including comsol routines into the path.')
addpath /share/apps/comsol/5.6/mli/
disp('Connecting to COMSOL from MATLAB')
mphstart(2036)
disp('Connection established')
disp('Starting Model Control Script')
script;
disp('Exiting Matlab')
exit(0);
The Model Control Script script.m
could be e.g. the following:
import com.comsol.model.*;
import com.comsol.model.util.*;
model = ModelUtil.create('Model1');
model.component.create('comp1', true);
%...
The job is submitted with:
$ sbatch comsol_matlab_livelink.sh
Cluster computing controlled from your windows workstation
The following example shows a working set of settings to use triton as a remote computation cluster for COMSOL.
Prerequisities:
Store ssh-keys in pagent so that you can connect to triton with putty without entering the password.
Save / install putty executables locally, e.g. in Z:\putty:
plink.exe
pscp.exe
putty.exe
![]()
In this configuration,
sjjamsa
is replaced with your username.![]()
Deep learning software
This page has information on how to run deep learning frameworks on Triton GPUs.
Theano
Installation
The recommended way of installing theano is with an anaconda environment.
Detectron
Detectron uses Singularity containers, so you should refer to that page first for general information.
Detectron-image is based on a Dockerfile from Detectron’s repository. In this image Detectron has been installed to /detectron.
Usage
This example shows how you can launch Detectron on a gpu node. To run example
given in
Detectron repository
one can use the following Slurm script
:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem=8G
#SBATCH --gres=gpu:teslap100:1
#SBATCH -o detectron.out
module load singularity-detectron
mkdir -p $WRKDIR/detectron/outputs
singularity_wrapper exec python2 /detectron/tools/infer_simple.py \
--cfg /detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir $WRKDIR/detectron/outputs \
--image-ext jpg \
--wts https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
/detectron/demo
Now example can by run on GPU node with:
sbatch detectron.sh
In typical usage one does not want to download models for each run. To use stored models one needs to:
Copy detectron sample configurations from the image to your own configuration folder:
module load singularity-detectron mkdir -p $WRKDIR/detectron/ singularity_wrapper exec cp -r /detectron/configs $WRKDIR/detectron/configs cd $WRKDIR/detectron
Create data directory and download example models there:
mkdir -p data/ImageNetPretrained/MSRA mkdir -p data/coco_2014_train:coco_2014_valminusminival/generalized_rcnn wget -O data/ImageNetPretrained/MSRA/R-101.pkl \ https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl wget -O data/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \ https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl
Edit the weights-parameter in configuration file 12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml:
33c33 < WEIGHTS: $WRKDIR/detectron/data/ImageNetPretrained/MSRA/R-101.pkl --- > WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl
Edit
Slurm script
to point to downloaded weigths and models:
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --mem=8G
#SBATCH --gres=gpu:teslap100:1
#SBATCH -o detectron.out
module load singularity-detectron
mkdir -p $WRKDIR/detectron/outputs
singularity_wrapper exec python2 /detectron/tools/infer_simple.py \
--cfg $WRKDIR/detectron/configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml \
--output-dir $WRKDIR/detectron/outputs \
--image-ext jpg \
--wts $WRKDIR/detectron/data/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl \
/detectron/demo
Submit job:
sbatch detectron.sh
Fenics
This uses Singularity containers, so you should refer to that page first for general information.
Fenics-images are based on these images.
Usage
This example shows how you can run a fenics example. To run example one should first copy the examples from the image to a suitable folder:
mkdir -p $WRKDIR/fenics
cd $WRKDIR/fenics
module load singularity-fenics
singularity_wrapper exec cp -r /usr/local/share/dolfin/demo demo
The examples try to use interactive windows to plot the results. This is not
available in the batch queue so to fix this one needs to specify an
alternative matplotlib backend.
This
patch file
fixes example demo_poisson.py. Download it into $WRKDIR/fenics
and run
patch -d demo -p1 < fenics_matplotlib.patch
to fix the example. After this one can run the example with the following
Slurm script
:
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=1G
#SBATCH -o fenics_out.out
module purge
module load singularity-fenics
cd demo/documented/poisson/python/
srun singularity_wrapper run demo_poisson.py
To submit the script one only needs to run:
sbatch fenics.sh
Resulting image can be checked with e.g.:
eog demo/documented/poisson/python/poisson.png
FMRIprep
December 2022: Note that the previous module we had installed (fmriprep 20.2.0) has been FLAGGED by the developers. Please specify a different version, for example with module load singularity-fmriprep/22.1.0
.
module load singularity-fmriprep
fmriprep is installed as a singularity container. By default it will always run the current latest version. If you need a version that is not currently installed on triton, please open an issue at https://version.aalto.fi/gitlab/AaltoScienceIT/triton/issues
Here an example to run fmriprep for one subject, using an interactive session, without free-surfer reconall, using ica-aroma, and with co-registration to the 2mm isotropic MNI template space (MNI152NLin6Asym, FSL bounding box of 91x109x91 voxels). The raw data in BIDS format are in the path <path-to-bids>
, then you can create a folder for the derivatives that is different than the BIDS folder <path-to-your-derivatives-folder>
. Also create a temporary folder under your scratch/work folders for storing temporary files <path-to-your-scratch-temporary-folder>
for example /scratch/work/USERNAME/tmp/
. The content of this folder is removed after fmriprep has finished.
# Example running in an interactive session, this can be at maximum 24 hours
# You might want to use a tool such as "screen" or "tmux"
ssh triton.aalto.fi
# start screen or tmux
sinteractive --time=24:00:00 --mem=20G # you might need more or less memory or time depending on the size
module load singularity-fmriprep
singularity_wrapper exec fmriprep <path-to-bids> <path-to-your-derivatives-folder> -w <path-to-your-scratch-temporary-folder-for-this-participant> participant --participant-label 01 --output-spaces MNI152NLin6Asym:res-2 --use-aroma --fs-no-reconall --fs-license-file /scratch/shareddata/set1/freesurfer/license.txt
If you want to parallelyze things you can write a script that cycles through each subject labels and queues SBATCH jobs for each subject (it can be an array job or a series of serial jobs). It is important you tune your memory and time requirements before processing many subjects at once. It is important to create a dedicated temporary scratch folder for each subject
POST-processing
Fmriprep does the minimal preprocessing. There is no smoothing, no temporal filtering and in general you need to regress out the estimated confounds. They can be regressed before further analysis (e.g. functional connectivity, intersubject correlation), or they can be included as part of a general linear model (it is always the best to have them as close as possible to the model if this is what you are doing). If you plan to regress the confoudns without being part of a general linear model, the most simple way is then to decide which columns of the “confounds.tsv” matrix you want to use as confounds and use NIlearn image_clean https://nilearn.github.io/dev/modules/generated/nilearn.image.clean_img.html
- There are also tools for post-processing such as:
These are not installed on the singularity image, hence you need to experiment with these on your own.
Freesurfer
module load freesurfer
Follow the instruction to source the init script specific to your shell.
FSL
module load fsl
Follow the instruction to source the init script specific to your shell.
GCC
GNU Compiler Collection (GCC) is one of the most popular compilers for compiling C, C++ and Fortran programs.
In Triton we have various GCC versions installed, but only some of them are actively supported.
Basic usage
Hello world in C
Let’s consider the following Hello world-program
(hello.c
)
written in C.
#include <stdio.h>
int main()
{
printf("Hello world.\n");
return 0;
}
After downloading it to a folder, we can compile it with GCC.
First, let’s load up a GCC module:
module load gcc/8.4.0
Secondly, let’s compile the code:
gcc -o hello hello.c
Now we can run the program:
./hello
This outputs the expected Hello world-string.
Available installations
System compiler is installed only on the login node. Other versions of GCC are installed as modules.
GCC version |
Module name |
---|---|
4.8.5 |
none (on login node only) |
8.4.0 |
gcc/8.4.0 |
9.3.0 |
gcc/9.3.0 |
11.2.0 |
gcc/11.2.0 |
If you need a different version of GCC, please send a request through the issue tracker.
Old installations
These installations will work, but they are not actively updated.
GCC version |
Module name |
---|---|
6.5.0 |
gcc/6.5.0 |
9.2.0 |
gcc/9.2.0 |
9.2.0 with (CUDA offloading) |
gcc/9.2.0-cuda-nvptx |
Other old installations are not recommended.
GPAW
There is GPAW version installed in GPAW/1.0.0-goolf-triton-2016a-Python-2.7.11. It has been compiled with GCC, OpenBLAS and OpenMPI and it uses Python/2.7.11-goolf-triton-2016a as its base Python. You can load it with:
$ module load GPAW/1.0.0-goolf-triton-2016a-Python-2.7.11
You can create a virtual environment against the Python environment with:
$ export VENV=/path/to/env
$ virtualenv --system-site-packages $VENV
$ cd $VENV
$ source bin/activate
# test installation
$ python -c 'import gpaw; print gpaw'
GPAW site: https://wiki.fysik.dtu.dk/gpaw/
Gurobi Optimizer
Gurobi Optimizer is a commercial optimizing library.
License
Aalto University has a site-wide floating license for Gurobi.
Important notes
As of writing of this Guide, Aalto only has a valid license for Gurobi 9.X and older. Therefore Gurobi 10 cannot be run on triton unless you bring your own license.
Gurobi with Python
Package names
Unfortunately the python gurobi packages installed via pip and via conda come with
two distinct package names gurobi
for the anaconda package and gurobipy
for
the pip package. Normally, we install the guobi package in the anaconda environment,
but there are some anaconda modules which have the gurobipy package. So you might need
to select the correct package.
License Files for older Anaconda modules
Older anaconda modules on Triton might not have the GRB_LICENSE_FILE environment variable set
properly, so you might need to point to it manually. To do so, you need to create a
gurobi.lic
file in your home folder. The file should contain the following single line:
TOKENSERVER=lic-gurobi.aalto.fi
You can create this license file with the following command on the login node:
echo "TOKENSERVER=lic-gurobi.aalto.fi" > ~/gurobi.lic
The license is an Educational Institution Site License:
Free Academic License Requirements, Gurobi Academic Licenses: Can only be used by faculty, students, or staff of a recognized degree-granting academic institution. Can be used for: Research or educational purposes. Consulting projects with industry – provided that approval from Gurobi has been granted.
After setting the license, one can run, for example:
module load anaconda
python
And then run the following script
import gurobipy as gp
# Depending on your anaconda version you
# might need gurobi instead of gurobipy
# Create a new model
m = gp.Model()
# Create variables
x = m.addVar(vtype='B', name="x")
y = m.addVar(vtype='B', name="y")
z = m.addVar(vtype='B', name="z")
# Set objective function
m.setObjective(x + y + 2 * z, gp.GRB.MAXIMIZE)
# Add constraints
m.addConstr(x + 2 * y + 3 * z <= 4)
m.addConstr(x + y >= 1)
# Solve it!
m.optimize()
print(f"Optimal objective value: {m.objVal}")
print(f"Solution values: x={x.X}, y={y.X}, z={z.X}")
Gurobi with Julia
For Julia there exists a package called
Gurobi.jl that provides an interface
to Gurobi. This package needs Gurobi C libraries so that it can run. The
easiest way of obtaining these libraries is to load the anaconda
-module and
use the same libraries that the Python API uses.
To install Gurobi.jl, one can use the following commands:
module load gurobi/9.5.2
module load julia
julia
After this, in the julia
-shell, install Gurobi.jl
with:
using Pkg
Pkg.add("Gurobi")
Pkg.build("Gurobi")
# Test installation
using Gurobi
Gurobi.Optimizer()
Before using the package do note the recommendations from Gurobi.jl’ GitHub-page regarding the use of JuMP.jl and the reuse of environments.
Gurobi with any other language supported by gurobi
For other languages supported by gurobi (like MATLAB, R or C/C++) use
module load gurobi/9.5.2
to load gurobi version 9.5.2 and then follow the instructions from the gurobi web-page. All global variables necessary for gurobi are already set, so you don’t need any further configuration
Intel Compilers
Intel provides their own compiler suite which is popular in HPC settings.
This suite contains compilers for C (icc
), C++ (icpc
) and Fortran (ifc
).
Previously this suite was licensed, but nowadays Intel provides it for free as a part of their oneAPI-program. This change has had an effect on many module names.
In Triton we have various versions of the Intel compiler suite installed, but only some of them are actively supported.
Basic usage
Choosing a GCC for Intel compilers
Intel uses many tools from the GCC suite and thus it is recommended to
load a gcc
-module with it:
module load gcc/8.4.0 intel-oneapi-compilers/2021.4.0
See GCC page for more information on available GCC compilers.
Hello world in C
Let’s consider the following Hello world-program
(hello.c
)
written in C.
#include <stdio.h>
int main()
{
printf("Hello world.\n");
return 0;
}
After downloading it to a folder, we can compile it with Intel C compiler (icc
).
First, let’s load up Intel compilers and a GCC module that icc
will use in the background:
module load gcc/8.4.0 intel-oneapi-compilers/2021.4.0
Now let’s compile the code:
icc -o hello hello.c
Now we can run the program:
./hello
This outputs the expected Hello world-string.
Current installations
There are various Intel compiler versions installed as modules.
Intel compiler version |
Module |
---|---|
2021.2.0 |
intel-oneapi-compilers/2021.2.0 |
2021.3.0 |
intel-oneapi-compilers/2021.3.0 |
2021.4.0 |
intel-oneapi-compilers/2021.4.0 |
If you need a different version of these compilers, please send a request through the issue tracker.
Old installations
These installations will work, but they are not actively updated.
Intel compiler version |
Module |
---|---|
2019.3 with Intel MPI |
intel-parallel-studio/cluster.2019.3-intelmpi |
2019.3 |
intel-parallel-studio/cluster.2019.3 |
2020.0 with Intel MPI |
intel-parallel-studio/cluster.2020.0-intelmpi |
2020.0 |
intel-parallel-studio/cluster.2020.0 |
Other old installations are not recommended.
Julia
The Julia programming language is a high-level, high-performance dynamic programming language for technical computing, in the same space as e.g. MATLAB, Scientific Python, or R. For more details, see their web page.
Interactive usage
Julia is available in the module system. By default the latest stable release is loaded:
module load julia
julia
Batch usage
Running Julia scripts as batch jobs is also possible. An example batch script is provided below:
#!/bin/bash
#SBATCH --time=00:01:00
#SBATCH --mem=1G
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
module load julia
srun julia juliascript.jl
Number of threads to use
By default Julia uses up to 16 threads for linear algebra (BLAS)
computations. In most cases, this number will be larger than the amount
of CPUs reserved for the job. Thus when running Julia jobs it is a good idea
to set the number of parallelization threads to be equal to the number of
threads reserved for the job with --cpus-per-task
. Otherwise, the
performance of your program might be poor. This can be done by adding the
following line to your slurm-script:
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
Alternatively, you can use the blas_set_num_threads()
-function in Julia.
JupyterHub on Triton
Note
For new users
Are you new to Triton and want to access JupyterHub? Triton is a high-performance computing cluster, and JupyterHub is just one of our services - one of the easiest ways to get started. You still need a Triton account. This site has many instructions, but you should read at least:
About us, how to get help, and acknowledging Triton usage (this JupyterHub is part of Triton, and thus Science-IT must be acknowledged in publications).
The accounts page, in order to request a Triton account.
Possibly the storage page and remote data access page to learn about the places to store data and how to transfer data.
The JupyterHub section of this page (below).
If you want to use Triton more, you should finish the entire tutorials section.

< Triton JupyterHub Demo >
Jupyter notebooks are a way of interactive, web-based computing: instead of either scripts or interactive shells, the notebooks allow you to see a whole script + output and experiment interactively and visually. They are good for developing and testing things, but once things work and you need to scale up, it is best to put your code into proper programs (more info). You must do this if you are going to large parallel computing.
Triton’s JupyterHub is available at https://jupyter.triton.aalto.fi. You can try them online at try.jupyter.org (there is a temporary notebook with no saving).
You can always run notebooks yourself on your own (or remote) computers, but on Triton we have some facilities already set up to make it easier.
How Jupyter notebooks work
Start a notebook
Enter some code into a cell.
Run it with the buttons or
Control-enter
orShift-enter
to run a cell.Edit/create new cells, run again. Repeat indefinitely.
You have a visual history of what you have run, with code and results nicely interspersed. With certain languages such as Python, you can plots and other things embedded, so that it becomes a complete reproducible story.
JupyterLab is the next iteration of this and has many more features, making it closer to an IDE or RStudio.
Notebooks are without a doubt a great tool. However, they are only one tool, and you need to know their limitations. See our other page on limitations of notebooks.
JupyterHub
Note
JupyterHub on Triton is still under development, and features will be added as they are needed or requested. Please use the Triton issue tracker.
The easiest way of using Jupyter is through JupyterHub - it is a multi-user jupyter server which takes a web-based login and spawns your own single-user server. This is available on Triton.
Connecting and starting
Currently jupyterHub is available only within Aalto networks, or from the rest of the internet after a first Aalto login: https://jupyter.triton.aalto.fi.
Once you log in, you must start your single-user server. There are
several options available that trade off between long run time and
short run time but more memory available. Your server runs in the
Slurm queue, so the first start-up takes a few seconds but after that
it will stay running even if you log out. The resources you request
are managed by slurm: if you go over the memory limit, your server
will be killed without warning or notification (but you can see it in
the output log, ~/'jupyterhub_slurmspawner_*.log
). The Jupyter
server nodes are oversubscribed, which means that we can allocate more
memory and CPU than is actually available. We will monitor the nodes
to try to ensure that there are enough resources available, so do
report problems to us. Please request the minimum amount of memory
you think you need - you can always restart with more memory. You
can go over your memory request a little bit before you get problems.
When you use Jupyter via this interface, the slurm billing weights are lower, so that the rest of your Triton priority does not decrease by as much.
Usage
Once you get to your single-user server Jupyter running as your own
user on Triton. You begin in a convenience directory which has links to
home
, scratch
, etc. You can not make files in this directory
(it is read-only), but you can navigate to the other folders to create
your notebooks. You have access to all the Triton filesystems (not
project/archive) and all normal software.
We have some basic extensions installed:
Jupyterlab (to use it, change
/tree
in the URL to/lab
). Jupyterlab will eventually be made the default.modules integration
jupyter_contrib_nbextensions - check out the variable inspector
diff and merge tools (currently does not work somehow)
The log files for your single-user servers can be found in, see
~/jupyterhub_slurmspawner_*.log
. When a new server starts, these
are automatically cleaned up when they are one week old.
For reasons of web security, you can’t install your own extensions (but you can install your own kernels). Send your requests to us instead.
Problems? Requests?
This service is currently in beta and under active development. If you notice problems or would like any more extensions or features, let us know. If this is useful to you, please let us know your user store, too. In the current development stage, the threshold for feedback should be very low.
Currently, the service level is best effort. The service may go down at any time and/or notebooks may be killed whenever there is a shortage of resources or need of maintenance. However, notebooks auto-save and do survive service restarts, and we will try to avoid killing things unnecessarily.
Software and kernels
A Jupyter Kernel is the runtime which actually executes the code
in the notebook (and it is separate from JupyterHub/Jupyter
itself). We have various kernels automatically installed (these
instructions should apply to both JupyterHub and sjupyter
):
Python (2 and 3 via some recent anaconda modules + a few more Python modules.)
Matlab (latest module)
Bash kernel
R (a default R environment you can get by
module load r-triton
. (“R (safe)” is similar but tries to block some local user configuration which sometimes breaks things, see FAQ for more hints.)We do not yet have a kernel management policy. Kernels may be added or removed over time. We would like to keep them synced with the most common Triton modules, but it will take some time to get this automatic. Send requests and problem reports.
Since these are the normal Triton modules, you can submit installation requests for software in these so that it is automatically available.
What’s a kernel? Where are they?
As stated at the start of this section, the kernel is what actually
runs the code. An example of a kernel command line is 'python -m
ipykernel_launcher -f{connection_file}
. What python
starts?:
that depends on the environment or adding an absolute path.
You can list your installed kernels with jupyter kernelspec
list
(to ensure the list is the same as jupyter.triton sees,
module load jupyterhub/live first
). Look in these directories,
at kernel.json
, to see just what it does.
You can remove kernels by removing their directory or jupyter
kernelspec remove
.
The program envkernel can serve as a wrapper to a) modify kernel.json files and b) adjust the environment (e.g. loading modules) at runtime, which can be hard to fully emulate by statically defining environment variables in kernel.json.
Installing kernels from virtualenvs or Anaconda environments
If you want to use Jupyter with your own packages, you can do that. First, make a conda environment / virtual environment on Triton and install the software you need in it (see Anaconda and conda environments or Python: virtualenv). This environment can be used for other things, such as your own development outside of Jupyter.
You have to have the package ipykernel
installed in the
environment: Add it to your requirements/environment, or activate the
environment and do pip install ipykernel
.
Then, you need to make the environment visible inside of Jupyter. For conda environments, you can do:
$ module load jupyterhub/live
$ envkernel conda --user --name INTERNAL_NAME --display-name="My conda" /path/to/conda_env
Or for Python virtualenvs:
$ module load jupyterhub/live
$ envkernel virtualenv --user --name INTERNAL_NAME --display-name="My virtualenv" /path/to/virtualenv
Installing a different R module as a kernel
Load your R modules, install R kernel normally (to some NAME
),
use envkernel as a wrapper to re-write the kernel (reading the
NAME
and rewriting to the same NAME
), after it loads the
modules you need:
## Load jupyterhub/live, and R 3.6.1 with IRkernel.
$ module load r-irkernel/1.1-python3
$ module load jupyterhub/live
## Use Rscript to install jupyter kernel
$ Rscript -e "library(IRkernel); IRkernel::installspec(name='NAME', displayname='R 3.6.1')"
## Use envkernel to re-write, loading the R modules.
$ envkernel conda --user --kernel-template=NAME --name=NAME $CONDA_PREFIX
Installing a different R version as a kernel
There are two ways to install a different R version kernel for jupyter. One relies on you building your own conda environment. The disadvantage is that you will need to create a kernel, the advantage is that you can add additional packages. The other option is to use the existing R installations on Triton.
You will need to create your own conda environment with all packages that are necessary to deploy the environment as a kernel.:
## Load and miniconda before creating your environment - this provides mamba that is used to create your environment
$ module load miniconda
Create your conda environment, selecting a NAME
for the environment.:
## This will use the latest R version on conda-forge. If you need a specific version you can specify it
## as r-essentials=X.X.X, where X.X.X is your required R version number
$ mamba create -n NAME -c conda-forge r-essentials r-irkernel
## If Mamba doesn't work you can also replace it with conda, but usually mamba is a lot faster
The next steps are the same as building a Kernel, except for activating the environment instead of
loading the r-irkernel module, since this module depends on the R version.
the displayname
will be what will be displayed on jupyter
## Use Rscript to install jupyter kernel, you need the environment for this.
## You need the Python `jupyter` command so R can know the right place to
## install the kernel (provided by jupyterhub/live)
$ module load jupyterhub/live
$ source activate NAME
$ Rscript -e "library(IRkernel); IRkernel::installspec(name='NAME', displayname='YOUR R Version')"
$ conda deactivate NAME
## For R versions before 4, you need to install the kernel. After version 4 IRkernel automatically installs it.
$ envkernel lmod --user --kernel-template=NAME --name=NAME
First, you need to load the R version you want to create to deploy the environment as a kernel:
$ module spider r
## Select one of the displayed R versions and load it with the following line
$ module load r/THE_VERSION_YOU_WANT
Start R and install the IRkernel package.
## start R
$ R
## In R install the IRkernel package (to your home directory)
install.packages('IRkernel')
## exit R again
Create the installation specs using Rscript and IRKernel. Select a NAME
for the environment specification
that can be used to install it. The
Next install the jupyter kernel. Here you need to select the NAME
given before.
The NAME is what is will be referred to for installation, while DISPLAYNAME
will be displayed in jupyter:
## Use Rscript to install the jupyter kernel. The jupyterhub/live module is required to point R at the right place for jupyter
$ module load jupyterhub/live
$ Rscript -e "library(IRkernel); IRkernel::installspec(name='NAME', displayname='DISPLAYNAME')"
## For R versions before 4, you need to install the kernel. After version 4 IRkernel automatically installs it.
$ envkernel lmod --user --kernel-template=NAME --name=IMAGENAME YOURRMODULE
## YOURRMODULE should match the module you loaded above (THE_VERSION_YOU_WANT above)
Note
Installing R packages for jupyter
Installing packages via jupyter can be problematic, as they require interactivity, which jupyter does not readily support. To install packages therefore go directly to triton. Load the environment or R module you use and install the packages ineractively. After that is done, restart your jupyter session and reload your kernel, all packages that you installed should then be available.
Install your own kernels from other Python modules
This works if the module provides the command python
and
ipykernel
is installed. This has
to be done once in any Triton shell:
$ module load jupyterhub/live
$ envkernel lmod --user --name INTERNAL_NAME --display-name="Python from my module" MODULE_NAME
$ module purge
Install your own kernels from Singularity image
First, find the .simg
file name. If you are using this from one
of the Triton modules, you can use module show MODULE_NAME
and
look for SING_IMAGE
in the output.
Then, install a kernel for your own user using envkernel. This has to be done once in any Triton shell:
$ module load jupyterhub/live
$ envkernel singularity --user --name KERNEL_NAME --display-name="Singularity my kernel" SIMG_IMAGE
$ module purge
As with the above, the image has to provide a python
command and
have ipykernel
installed (assuming you want to use Python, other
kernels have different requirements).
Julia
Julia: currently doesn’t seem to play nicely with global installations (so we can’t install it for you, if anyone knows something otherwise, let us know). Roughly, these steps should work to install the kernel yourself:
$ module load julia
$ module load jupyterhub/live
$ julia
julia> Pkg.add("IJulia")
If this doesn’t work, it may think it is already installed. Force it with this:
julia> using IJulia
julia> installkernel("julia")
Install your own non-Python kernels
First,
module load jupyterhub/live
. This loads the anaconda environment which contains all the server code and configuration. (This step may not be needed for all kernels)Follow the instructions you find for your kernel. You may need to specify
--user
or some such to have it install in your user directory.You can check your own kernels in
~/.local/share/jupyter/kernels/
.
If your kernel involves loading a module,
you can either a) load the modules within the notebook server
(“softwares” tab in the menu), or b) update your kernel.json
to
include the required environment variables (see kernelspec).
(We need to do some work to figure out just how this works). Check
/share/apps/jupyterhub/live/miniconda/share/jupyter/kernels/ir/kernel.json
for an example of a kernel that loads a module first.
From Jupyter notebooks to running on the queue
While jupyter is great to interactively run code, it can become a problem if you need to run multiple parameter sets through a jupyter notebook or you need a specific resource which is not available for jupyter. The latter might be because the resource is sparse enough that having an open jupyter session that finished a part and is waiting for the user to start the next is idly blocking the resource. At this point you will likely want to move your code to pure python and run it via the queue.
Here are the steps necessary to do so:
Log into Triton via ssh ( Tutorials can be found here and here ).
In the resulting terminal session, load the jupyterhub module to have jupyter available (
module load jupyterhub
)Navigate to the folder where your jupyter notebooks are located. You can see the path by moving your mouse over the files tab on jupyterlab.
Convert the notebook(s) you want to run on the cluster (
jupyter nbconvert yourScriptName.ipynb --to python
).If you need to run your code for multiple different parameters, modify the python code to allow input parameter parsing (e.g. using argparse, or docopt ) You should include both input and output arguments as you want to save files to different result folders or have them have indicative filenames. There are two main reasons for this approach: A) it makes your code more maintainable, since you don’t need to modify the code when changing parameters and B) you are less likely to use the wrong version of your code (and thus getting the wrong results).
(Optional) Set up a conda environment. This is mainly necessary if you have multiple conda or pip installable packages that are required for your job and which are not part of the normal anaconda module. Try it via
module load anaconda
. You can’t install into the anaconda environment provided by the anaconda module and you should NOT usepip install --user
as it will bite you later (and can cause difficult to debug problems). If you need to set up your own environment follow this guideSet up a slurm batch script in a file e.g.
simple_python_gpu.sh
. You can do this either withnano simple_python_gpu.sh
(to save the file pressctrl+x
, typey
to save the file and pressEnter
to accept the file name), or you can mount the triton file system and use your favorite editor, for guides on how to mount the file system have a look here and here). Depending on your OS, it might be difficult to mount home and it is anyways best practice to use/scratch/work/USERNAME
for your code.Here
is an example:#!/bin/bash #SBATCH --cpus-per-task 1 # The number of CPUs your code can use, if in doubt, use 1 for CPU only code or 6 if you run on GPUs (since code running on GPUs commonly allows parallelization of data provision to the GPU) #SBATCH --mem 10G # The amount of memory you expect your code to need. Format is 10G for 10 Gigabyte, 500M for 500 Megabyte etc #SBATCH --time=01:00:00 # Time in HH:MM:SS or DD-HH of your job. the maximum is 120 hours or 5 days. #SBATCH --gres=gpu:1 # Additional specific ressources can be requested via gres. Mainly used for requesting GPUs format is: gres=RessourceType:Number module load anaconda # or module load miniconda if you use your own environment. source activate yourEnvironment # if you use your own environment python yourScriptName.py ARG
This is a minimalistic example. If you have parameter sets that you want to use have a look at array jobs here)
Submit your batch script to the queue :
sbatch simple_python_gpu.sh
This call will print a message like:Submitted batch job <jobid>
You can use e.g.slurm q
to see your current jobs and their status in the queue, or monitor your jobs as described here.
Git integration
You can enable git integration on Triton by using the following
lines from inside a git repository. (This is normal nbdime, but uses
the centrally installed one so that you don’t have to load a
particular conda environment first. The sed
command fixes
relative paths to absolute paths, so that you use the tools no matter
what modules you have loaded):
$ /share/apps/jupyterhub/live/miniconda/bin/nbdime config-git --enable
$ sed --in-place -r 's@(= )[ a-z/-]*(git-nb)@\1/share/apps/jupyterhub/live/miniconda/bin/\2@' .git/config
FAQ/common problems
Jupyterhub won’t spawn my server: “Error: HTTP 500: Internal Server Error (Spawner failed to start [status=1].”. Is your home directory quota exceeded? If that’s not it, check the
~/jupyterhub_slurmspawner_*
logs then contact us.My server has died mysteriously. This may happen if resource usage becomes too much and exceed the limits - Slurm will kill your notebook. You can check the
~/jupyterhub_slurmspawner_*
log files for jupyterhub to be sure.My server seems inaccessible / I can’t get to the control panel to restart my server. Especially with JupyterLab. In JupyterLab, use File→Hub Control Panel. If you can’t get there, you can change the URL to
/hub/home
.My R kernel keeps dying. Some people seem to have global R configuration, either in
.bashrc
or.Renviron
or some such which globally, which even affects the R kernel here. Things we have seen: pre-loading modules in.bashrc
which conflict with the kernel R module; changingRLIBS
in.Renviron
. You can either (temporarily or permanently) remove these changes, or you could install your own R kernel. If you install your own, it is up to you to maintain it (and remember that you installed it).“Spawner pending” when you try to start - this is hopefully fixed in issue #1534/#1533 in JupyterHub. Current recommendation: wait a bit and return to JupyterHub home page and see if the server has started. Don’t click the button twice!
See also
-
Online demos and live tutorial: https://jupyter.org/try (use the Python one)
Jupyter basic tutorial: https://www.youtube.com/watch?v=HW29067qVWk (this is just the first link on youtube - there are many more too)
More advanced tutorial: Data Science is Software (this is not just a Jupyter tutorial, but about the whole data science workflow using Jupyter. It is annoying long (2 hours), but very complete and could be considered good “required watching”)
CSC has this service, too, however there is no long term storage yet so there is limited usefulness for research: https://notebooks.csc.fi/
Our configuration is available on Github. Theoretically, all the pieces are here but it is not yet documented well and not yet generalizable. The Ansible role is a good start but the jupyterhub config and setup is hackish.
Ansible config role: https://github.com/AaltoSciComp/ansible-role-fgci-jupyterhub
Configuration and automated conda environment setup: https://github.com/AaltoSciComp/triton-jupyterhub
Keras
- supportlevel:
- pagelastupdated:
2020-02-20
- maintainer:
Keras is a neural network library which runs on tensorflow (among other things).
Basic usage
Keras is available in the anaconda
module and some other anaconda
modules. Run module spider anaconda
to list available modules.
You probably want to learn how to run in the GPU queues. The other information in the tensorflow page also applies, especially the --constraint
options to restrict to the GPUs that have new enough features.
Example
srun --gres=gpu:1 --pty bash
module load anaconda
python3
>>> import keras
Using TensorFlow backend.
>>> keras.__version__
'2.2.4'
LAMMPS
- pagelastupdated:
2023-02-08
LAMMPS is a classical molecular dynamics simulation code with a focus on materials modeling.
Building a basic version of LAMMPS
LAMMPS is typically built based on the specific needs of the simulation. When building LAMMPS one can enable and disable various different packages that enable commands when LAMMPS is run.
LAMMPS has an extensive guide on how to build LAMMPS. The recommended way of building LAMMPS is with CMake.
Below are instructions on how to do a basic build of LAMMPS with OpenMP and MPI parallelizations enabled.
One can obtain LAMMPS source code either from LAMMPS download page or from LAMMPS source repository. Here we’ll be using the version 22Jun2022.
# Obtain source code and go to the code folder
wget https://download.lammps.org/tars/lammps-23Jun2022.tar.gz
tar xf lammps-23Jun2022.tar.gz
cd lammps-23Jun2022
# Create a build folder and go to it
mkdir build
cd build
# Activate CMake and OpenMPI modules needed by LAMMPS
module load cmake gcc/11.3.0 openmpi/4.1.5
# Configure LAMMPS packages and set install folder
cmake ../cmake -D BUILD_MPI=yes -D BUILD_OMP=yes -D CMAKE_INSTALL_PREFIX=../../lammps-mpi-23Jun2022
# Build LAMMPS
make -j 2
# Install LAMMPS
make install
# Go back to starting folder
cd ../..
# Add installed LAMMPS to the executable search path
export PATH=$PATH:$PWD/lammps-mpi-23Jun2022/bin
Now we can verify that we have a working LAMMPS installation with the following command:
echo "info configuration" | srun lmp
The output should look something like this:
srun: job 11839786 queued and waiting for resources
srun: job 11839786 has been allocated resources
LAMMPS (23 Jun 2022 - Update 2)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Info-Info-Info-Info-Info-Info-Info-Info-Info-Info-Info
Printed on Thu Jan 19 17:20:21 2023
LAMMPS version: 23 Jun 2022 / 20220623
OS information: Linux "CentOS Linux 7 (Core)" 3.10.0-1160.71.1.el7.x86_64 x86_64
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit
Compiler: GNU C++ 8.4.0 with OpenMP 4.5
C++ standard: C++11
Active compile time flags:
-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_SMALLBIG
Available compression formats:
Extension: .gz Command: gzip
Extension: .bz2 Command: bzip2
Extension: .xz Command: xz
Extension: .lzma Command: xz
Extension: .lz4 Command: lz4
Installed packages:
Info-Info-Info-Info-Info-Info-Info-Info-Info-Info-Info
Total wall time: 0:00:00
Building a version of LAMMPS with most packages
Many packages in LAMMPS need other external libraries such as BLAS and FFTW libraries. These extra libraries can be given to LAMMPS via flags mentioned in this documentation, but in most cases loading the appropriate modules from the module system is enough for CMake to find the libraries.
To include extra packages in the build one can either use flags mentioned in this documentation or one can use developer maintained CMake presets for installing a collection of packages.
Below is an example that installs LAMMPS with “most packages”-collection enabled:
# Obtain source code and go to the code folder
wget https://download.lammps.org/tars/lammps-23Jun2022.tar.gz
tar xf lammps-23Jun2022.tar.gz
cd lammps-23Jun2022
# Create a build folder and go to it
mkdir build
cd build
# Activate CMake and OpenMPI modules needed by LAMMPS
module load cmake gcc/11.3.0 openmpi/4.1.5 fftw/3.3.10 openblas/0.3.23 eigen/3.4.0 ffmpeg/6.0 voropp/0.4.6 zstd/1.5.5
# Configure LAMMPS packages and set install folder
cmake ../cmake -C ../cmake/presets/most.cmake -D BUILD_MPI=yes -D BUILD_OMP=yes -D CMAKE_INSTALL_PREFIX=../../lammps-mpi-most-23Jun2022
# Build LAMMPS
make -j 2
# Install LAMMPS
make install
# Go back to starting folder
cd ../..
# Add installed LAMMPS to the executable search path
export PATH=$PATH:$PWD/lammps-mpi-most-23Jun2022/bin
Now we can verify that we have a working LAMMPS installation with the following command:
echo "info configuration" | srun lmp
The output should look something like this:
srun: job 13235690 queued and waiting for resources
srun: job 13235690 has been allocated resources
LAMMPS (23 Jun 2022 - Update 2)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
Info-Info-Info-Info-Info-Info-Info-Info-Info-Info-Info
Printed on Tue Feb 07 11:41:05 2023
LAMMPS version: 23 Jun 2022 / 20220623
OS information: Linux "CentOS Linux 7 (Core)" 3.10.0-1160.71.1.el7.x86_64 x86_64
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit
Compiler: GNU C++ 8.4.0 with OpenMP 4.5
C++ standard: C++11
Active compile time flags:
-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
Available compression formats:
Extension: .gz Command: gzip
Extension: .bz2 Command: bzip2
Extension: .xz Command: xz
Extension: .lzma Command: xz
Extension: .lz4 Command: lz4
Installed packages:
ASPHERE BOCS BODY BPM BROWNIAN CG-DNA CG-SDK CLASS2 COLLOID COLVARS COMPRESS
CORESHELL DIELECTRIC DIFFRACTION DIPOLE DPD-BASIC DPD-MESO DPD-REACT
DPD-SMOOTH DRUDE EFF ELECTRODE EXTRA-COMPUTE EXTRA-DUMP EXTRA-FIX
EXTRA-MOLECULE EXTRA-PAIR FEP GRANULAR INTERLAYER KSPACE MACHDYN MANYBODY MC
MEAM MISC ML-IAP ML-SNAP MOFFF MOLECULE OPENMP OPT ORIENT PERI PHONON PLUGIN
POEMS QEQ REACTION REAXFF REPLICA RIGID SHOCK SPH SPIN SRD TALLY UEF VORONOI
YAFF
Info-Info-Info-Info-Info-Info-Info-Info-Info-Info-Info
Total wall time: 0:00:00
Examples
LAMMPS indent-example
Let’s run a simple example from LAMMPS examples. This specific model represents a spherical indenter into a 2D solid.
First, we need to get the example:
# Obtain source code and go to the code folder
wget https://download.lammps.org/tars/lammps-23Jun2022.tar.gz
tar xf lammps-23Jun2022.tar.gz
cd lammps-23Jun2022/examples/indent/
After this we can launch LAMMPS with a slurm script like this:
#!/bin/bash
#SBATCH --time=01:00:00
#SBATCH --mem=2G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --output=lammps_indent.out
# Load modules used for building the LAMMPS binary
module load cmake gcc/11.3.0 openmpi/4.1.5 fftw/3.3.10 openblas/0.3.23 eigen/3.4.0 ffmpeg/6.0 voropp/0.4.6 zstd/1.5.5
# Set path to LAMMPS executable
export PATH=$PATH:$PWD/../../../lammps-mpi-most-23Jun2022/bin
# Run simulation
srun lmp < in.indent
LLMs
Large-language models are AI models that can understand and generate text, primarily using transformer architectures. This page is about running them on a local HPC cluster. This requires extensive programming experience and knowledge of using the cluster (Tutorials), but allows maximum computational power for the least cost. Aalto RSE maintains these models and can provide help in using these, even to people who aren’t computational experts.
Because the models are typically very large and there are many people interested in them, we provide our users with pre-downloaded model weights and this page has instructions on how to load these weights for inference purposes or for retraining and fine-tuning the models.
Pre-downloaded model weights
Raw model weights
We have downloaded the following raw model weights (PyTorch model checkpoints):
Model type |
Model version |
Module command to load |
Description |
---|---|---|---|
Llama 2 |
Raw Data |
|
Raw weights of Llama 2. |
Llama 2 |
7b |
|
Raw weights of 7B parameter version of Llama 2. |
Llama 2 |
7b-chat |
|
Raw weights of 7B parameter chat optimized version of Llama 2. |
Llama 2 |
13b |
|
Raw weights of 13B parameter version of Llama 2. |
Llama 2 |
13b-chat |
|
Raw weights of 13B parameter chat optimized version of Llama 2. |
Llama 2 |
70b |
|
Raw weights of 70B parameter version of Llama 2. |
Llama 2 |
70b-chat |
|
Raw weights of 70B parameter chat optimized version of Llama 2. |
CodeLlama |
Raw Data |
|
Raw weights of CodeLlama. |
CodeLlama |
7b |
|
Raw weights of 7B parameter version of CodeLlama. |
CodeLlama |
7b-Python |
|
Raw weights of 7B parameter version CodeLlama, specifically designed for Python. |
CodeLlama |
7b-Instruct |
|
Raw weights of 7B parameter version CodeLlama, designed for instruction following. |
CodeLlama |
13b |
|
Raw weights of 13B parameter version of CodeLlama. |
CodeLlama |
13b-Python |
|
Raw weights of 13B parameter version CodeLlama, specifically designed for Python. |
CodeLlama |
13b-Instruct |
|
Raw weights of 13B parameter version CodeLlama, designed for instruction following. |
CodeLlama |
34b |
|
Raw weights of 34B parameter version of CodeLlama. |
CodeLlama |
34b-Python |
|
Raw weights of 34B parameter version CodeLlama, specifically designed for Python. |
CodeLlama |
34b-Instruct |
|
Raw weights of 34B parameter version CodeLlama, designed for instruction following. |
Each module will set the following environment variables:
MODEL_ROOT
- Folder where model weights are stored, i.e., PyTorch model checkpoint directory.TOKENIZER_PATH
- File path to the tokenizer.model.
Here is an example slurm, script using the raw weights to do batch inference. For detailed environment setting up, example prompts and Python code, please check out this repo.
#!/bin/bash
#SBATCH --time=00:25:00
#SBATCH --cpus_per_task=4
#SBATCH --mem=20GB
#SBATCH --gres=gpu:1
#SBATCH --output=llama2inference-gpu.%J.out
#SBATCH --error=llama2inference-gpu.%J.err
# get the model weights
module load model-llama2/7b
echo $MODEL_ROOT
# Expect output: /scratch/shareddata/dldata/llama-2/llama-2-7b
echo $TOKENIZER_PATH
# Expect output: /scratch/shareddata/dldata/llama-2/tokenizer.model
# activate your conda environment
module load miniconda
source activate llama2env
# run batch inference
torchrun --nproc_per_node 1 batch_inference.py \
--prompts prompts.json \
--ckpt_dir $MODEL_ROOT \
--tokenizer_path $TOKENIZER_PATH \
--max_seq_len 512 --max_batch_size 16
Model weight conversions
Usually, models produced in research are stored as weights from PyTorch or other frameworks. As for inference, we also have models that are already converted to different formats.
Huggingface Models
Following Huggingface models are stored on triton. Full list of all the available models are located at /scratch/shareddata/dldata/huggingface-hub-cache/models.txt
. Please contact us if you need any other models.
Model type |
Huggingface model identifier |
---|---|
Text Generation |
mistralai/Mistral-7B-v0.1 |
Text Generation |
mistralai/Mistral-7B-Instruct-v0.1 |
Text Generation |
tiiuae/falcon-7b |
Text Generation |
tiiuae/falcon-7b-instruct |
Text Generation |
tiiuae/falcon-40b |
Text Generation |
tiiuae/falcon-40b-instruct |
Text Generation |
google/gemma-2b-it |
Text Generation |
google/gemma-7b |
Text Generation |
google/gemma-7b-it |
Text Generation |
google/gemma-7b |
Text Generation |
LumiOpen/Poro-34B |
Text Generation |
meta-llama/Llama-2-7b-hf |
Text Generation |
meta-llama/Llama-2-13b-hf |
Text Generation |
meta-llama/Llama-2-70b-hf |
Text Generation |
codellama/CodeLlama-7b-hf |
Text Generation |
codellama/CodeLlama-13b-hf |
Text Generation |
codellama/CodeLlama-34b-hf |
Translation |
Helsinki-NLP/opus-mt-en-fi |
Translation |
Helsinki-NLP/opus-mt-fi-en |
Translation |
t5-base |
Fill Mask |
bert-base-uncased |
Fill Mask |
bert-base-cased |
Fill Mask |
distilbert-base-uncased |
Text to Speech |
microsoft/speecht5_hifigan |
Text to Speech |
facebook/hf-seamless-m4t-large |
Automatic Speech Recognition |
openai/whisper-large-v3 |
Token Classification |
dslim/bert-base-NER-uncased |
All Huggingface models can be loaded with module load model-huggingface/all
.
Here is a Python script using huggingface model.
## Force transformer to load model(s) from local hub instead of download and load model(s) from remote hub. NOTE: this must be run before importing transformers.
import os
os.environ['TRANSFORMERS_OFFLINE'] = '1'
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "How many stars in the space?"
model_inputs = tokenizer([prompt], return_tensors="pt")
input_length = model_inputs.input_ids.shape[1]
generated_ids = model.generate(**model_inputs, max_new_tokens=20)
print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
llama.cpp and GGUF
llama.cpp is a popular framework for running inference on LLM models with CPUs or GPUs. llama.cpp uses a format called GGUF as its storage format.
We have llama.cpp conversions of all Llama 2 and CodeLlama models with multiple quantization levels.
NOTE: Before loading the following modules, one must first load a module for the raw model weights. For example, run module load model-codellama/34b
first, and then run module load codellama.cpp/q8_0-2023-12-04
to get the 8-bit integer version of CodeLlama weights in a .gguf file.
Model type |
Model version |
Module command to load |
Description |
---|---|---|---|
Llama 2 |
f16-2023-08-28 |
|
Half precision version of Llama 2 weights done with llama.cpp on 4th of Dec 2023. |
Llama 2 |
q4_0-2023-08-28 |
|
4-bit integer version of Llama 2 weights done with llama.cpp on 4th of Dec 2023. |
Llama 2 |
q4_1-2023-08-28 |
|
4-bit integer version of Llama 2 weights done with llama.cpp on 4th of Dec 2023. |
Llama 2 |
q8_0-2023-08-28 |
|
8-bit integer version of Llama 2 weights done with llama.cpp on 4th of Dec 2023. |
CodeLlama |
f16-2023-08-28 |
|
Half precision version of CodeLlama weights done with llama.cpp on 4th of Dec 2023. |
CodeLlama |
q4_0-2023-08-28 |
|
4-bit integer version of CodeLlama weights done with llama.cpp on 4th of Dec 2023. |
CodeLlama |
q8_0-2023-08-28 |
|
8-bit integer version of CodeLlama weights done with llama.cpp on 4th of Dec 2023. |
Each module will set the following environment variables:
MODEL_ROOT
- Folder where model weights are stored.MODEL_WEIGHTS
- Path to the model weights in GGUF file format.
This Python code snippet is part of a ‘Chat with Your PDF Documents’ example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and Python code, please check out this repo.
import os
from langchain.llms import LlamaCpp
model_path = os.environ.get('MODEL_WEIGHTS')
llm = LlamaCpp(model_path=model_path, verbose=False)
More examples
Starting a local API
With the pre-downloaded model weights, you are also able create an API endpoint locally. For detailed examples, you can checkout this repo.
Using Mathematica on Triton
Load Mathematica
Mathematica is loaded through a module:
module load mathematica
See available versions with module avail mathematica
.
You can test by running in text-based mode:
$ wolfram
With graphical user interface
To launch the graphical user interface (GUI), login to triton.aalto.fi
with -X
, i.e. X11 forwarding enabled.
ssh -X triton.aalto.fi
If you need to run computationally-intensive things with the GUI, use
sinteractive
to get an interactive shell on a
node:
sinteractive --mem=10G --time=1:00
Either way, you start the GUI with mathematica:
$ mathematica &
Running batch scripts
Create a script file, say script.m
. You can run this script and
store the outputs in output.txt
using:
math -noprompt -run '<<script.m' > output.txt
To put this in a batch script, simply look at the serial jobs tutorial. Here is one such example:
#!/bin/bash
#SBATCH --mem=5G
#SBATCH --time=2:00
module load mathematica
math -noprompt -run '<<script.m'
Common problems
Activation If you need to activate Mathematica when you first run
it, we recommend that you launch it in GUI mode first, choose ‘Other
ways to activate’” then “Connect to a network license server”, and
paste lic-mathematica.aalto.fi
. It should be automatically
activated, though, if not file an issue and link this page.
See also
Various other references also apply here once you load the module and adapt them to Slurm:
Admin notes
When installing new versions, put !lic-mathematica.aalto.fi
into
Configuration/Licensing/mathpass
in the base directory
Matlab
This page will explain how to run Matlab jobs on triton, and introduce important details about Matlab on triton. (Note: We used to have the Matlab Distributed Computing Server (MDCS), but because of low use we no longer have a license. You can still run in parallel on one node, with up to 40 cores.)
Important notes
Matlab writes session data, compiled code and additional toolboxes to
~/.matlab
. This can quicky fill up your $HOME
quota. To fix this
we recommend that you replace the folder with a symlink that points to
a directory in your working directory.
rsync -lrt ~/.matlab/ $WRKDIR/matlab-config/ && rm -r ~/.matlab
ln -sT $WRKDIR/matlab-config ~/.matlab
quotafix -gs --fix $WRKDIR/matlab-config
If you run parallel code in matlab, keep in mind, that matlab uses your home folder as storage
for the worker files, so if you run multiple jobs you have to keep the worker folders seperate
To address this, you need to specify the worker location ( the JobStorageLocation
field of the parallel cluster) to a location unique to the job
% Initialize the parallel pool
c=parcluster();
% Create a temporary folder for the workers working on this job,
% in order not to conflict with other jobs.
t=tempname();
mkdir(t);
% set the worker storage location of the cluster
c.JobStorageLocation=t;
To address the latter, the number of parallel workers needs to explicitly be provided when initializing the parallel pool:
% get the number of workers based on the available CPUS from SLURM
num_workers = str2double(getenv('SLURM_CPUS_PER_TASK'));
% start the parallel pool
parpool(c,num_workers);
Here
we provide a small script, that does all those steps for you.
Interactive usage
Interactive usage is currently available via the sinteractive
tool. Do
not use the cluster front-end for this, but connect to a node with sinteractive
The login node is only meant for submitting jobs/compiling.
To run an interactive session with a user interface run the following commands from a terminal.
ssh -X user@triton.aalto.fi
sinteractive
module load matlab
matlab &
Simple serial script
Running a simple Matlab job is easy through the slurm queue. A sample slurm script is provided below:
#!/bin/bash -l
#SBATCH --time=00:05:00
#SBATCH --mem=100M
#SBATCH -o serial_Matlab.out
module load matlab
n=3
m=2
srun matlab -nojvm -nosplash -r "serial_Matlab($n,$m) ; exit(0)"
The above script can then be saved as a file (e.g. matlab_test.sh) and the job can be submitted with sbatch matlab_test.sh
. The actual calculation is done in serial_Matlab.m
-file:
function C = serial_Matlab(n,m)
try
A=0:(n*m-1);
A=reshape(A,[2,3]).'
B=2:(n*m+1);
B=reshape(B,[2,3]).'
C=0.5*ones(n,n)
C=A*(B.') + 2.0*C
catch error
disp(getReport(error))
exit(1)
end
end
Remember to always set exit into your slurm script so that the program quits
once the function serial_Matlab
has finished. Using a
try-catch-statement will allow your job to finish in case of any error
within the program. If you don’t do this, Matlab will drop into
interactive mode and do nothing while your job wastes time.
NOTE: Starting from version r2019a the launch options -r ...; exit(0)
can be easily
replaced with the -batch
option which automatically exits matlab at the end of the command that is passed
(see here for details).
So the last command from the slurm script above for Matlab r2019a will look like:
srun matlab -nojvm -nosplash -batch "serial_Matlab($n,$m);"
Running Matlab Array jobs
The most common way to utilize Matlab is to write a single .M-file that can be used to run tasks as a non-interactive batch job. These jobs are then submitted as independent tasks and when the heavy part is done, the results are collected for analysis. For these kinds of jobs the Slurm array jobs is the best choice; For more information on array jobs see Array jobs in the Triton user guide.
Here is an example of testing multiple mutation rates for a genetic algorithm. First, the matlab code.
% set the mutation rate
mutationRate = str2double(getenv('SLURM_ARRAY_TASK_ID'))/100;
opts = optimoptions('ga','MutationFcn', {@mutationuniform, mutationRate});
% Set population size and end criteria
opts.PopulationSize = 100;
opts.MaxStallGenerations = 50;
opts.MaxGenerations = 200000;
%set the range for all genes
opts.InitialPopulationRange = [-20;20];
% define number of variables (genes)
numberOfVariables = 6;
[x,Fval,exitFlag,Output] = ga(@fitness,numberOfVariables,[],[],[], ...
[],[],[],[],opts);
output = [4,-2,3.5,5,-11,-4.7] * x'
save(['MutationJob' getenv('i') '.mat'], 'output');
exit(0)
function fit = fitness(x)
output = [4,-2,3.5,5,-11,-4.7] * x';
fit = abs(output - 44);
end
We run this code with the following slurm script using sbatch
#!/bin/bash
#SBATCH --time=00:30:00
#SBATCH --array=1-100
#SBATCH --mem=500M
#SBATCH --output=r_array_%a.out
module load matlab
srun matlab -nodisplay -r serial
Collecting the results
Finally a wrapper script to read in the .mat files and plots the resulting values
function collectResults(maxMutationRate)
X=1:maxMutationRate
Y=zeros(maxMutationRate,1);
for index=1:maxMutationRate
% read the output from the jobs
filename = strcat( 'MutationJob', int2str( index ) );
load( filename );
Y(index)=output;
end
plot(X,Y,'b+:')
Seeding the random number generator
Note that by default MATLAB always initializes the random number
generator with a constant value. Thus if you launch several matlab
instances e.g. to calculate distinct ensembles, then you need to seed
the random number generator such that it’s distinct for each
instance. In order to do this, you can call the rng()
function,
passing the value of $SLURM_ARRAY_TASK_ID
to it.
Parallel Matlab with Matlab’s internal parallelization
Matlab has internal parallelization that can be activated by requesting
more than one cpu per task in the
Slurm script
and using the matlab_multithread
to start the interpreter.
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=500M
#SBATCH --cpus-per-task=4
#SBATCH --output=ParallelOut
module load matlab
srun matlab_multithread -nodisplay -r parallel_fun
An example function is provided in
this script
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=500M
#SBATCH --cpus-per-task=4
#SBATCH --output=ParallelOut
module load matlab
srun matlab_multithread -nodisplay -r parallel_fun
Parallel Matlab with parpool
Often one uses Matlab’s parallel pool for parallelization. When
using parpool
one needs to specify the number of workers. This
number should match the number of CPUs requested. parpool
uses
JVM so when launching the interpreter one needs to use -nodisplay
instead of -nojvm
. Example
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --mem=500M
#SBATCH --cpus-per-task=4
#SBATCH --output=matlab_parallel.out
module load matlab
srun matlab_multithread -nodisplay -r parallel
An example function is provided in
this script
initParPool()
% Create matrices to invert
mat = rand(1000,1000,6);
parfor i=1:size(mat,3)
invMats(:,:,i) = inv(mat(:,:,i))
end
% And now, we proceed to build the averages of each set of inverted matrices
% each time leaving out one.
parfor i=1:size(invMats,3)
usedelements = true(size(invMats,3),1)
usedelements(i) = false
res(:,:,i) = inv(mean(invMats(:,:,usedelements),3));
end
% end the program
exit(0)
Parallel matlab in exclusive mode
#!/bin/bash -l
#SBATCH --time=00:15:00
#SBATCH --exclusive
#SBATCH -o parallel_Matlab3.out
export OMP_NUM_THREADS=$(nproc)
module load matlab/r2017b
matlab_multithread -nosplash -r "parallel_Matlab3($OMP_NUM_THREADS) ; exit(0)"
parallel_Matlab3.m:
function parallel_Matlab3(n)
% Try-catch expression that quits the Matlab session if your code crashes
try
% Initialize the parallel pool
c=parcluster();
% Ensure that workers don't overlap with other jobs on the cluster
t=tempname()
mkdir(t)
c.JobStorageLocation=t;
parpool(c,n);
% The actual program calls from matlab's example.
% The path for r2017b
addpath(strcat(matlabroot, '/examples/distcomp/main'));
% The path for r2016b
% addpath(strcat(matlabroot, '/examples/distcomp'));
pctdemo_aux_parforbench(10000,100,n);
catch error
getReport(error)
disp('Error occured');
exit(0)
end
end
FAQ / troubleshooting
If things randomly don’t work, you can try removing or moving either the
~/.matlab
directory or ~/.matlab/Rxxxxy
directory to see if it’s
caused by configuration.
Random error messages about things not loading and/or something
(Matlab Live Editor maybe) doesn’t work: ls *.m
, do you have any
unexpected files like pathdef.m
in there? Remove them.
Also, check your home quota. Often .matlab
gets large and fills up
your home directory. Check the answer at the very top of the page,
under “Matlab Configuration”.
MLPack
- pagelastupdated:
2014
- supportlevel:
C
module load cmake; module load armadillo/4.3-mkl; module load mkl
mkdir build && cd build
cmake -D ARMADILLO_LIBRARY=$ARMADILLO_LIBRARY -D ARMADILLO_INCLUDE_DIR=$ARMADILLO_INCLUDE ../
make
bin/mlpack_test
make install CMAKE_INSTALL_PREFIX=/share/apps/mlpack/1.0.8
For newer boost library also load boost module and tell cmake where to find boost
module load boost
...
cmake -D BOOST_ROOT=$BOOST_ROOT -D ARMADILLO_LIBRARY=$ARMADILLO_LIBRARY -D ARMADILLO_INCLUDE_DIR=$ARMADILLO_INCLUDE ../
..
Notes
1.0.10 installation failed when installing doc to /usr/local (install prefix defined ad /share/apps/mlpack/1.0.10). The solution was manually tune install prefix at cmake_install.cmake
MNE
- pagelastupdate:
2018
- maintainer:
module load mne
Follow the instruction to source the init script specific to your shell. In the directory:
$MNE_ROOT/..
you can find the relase notes, the manual, and some sample data.
We do not recommend using the MNE command line tools, a more modern solution is to use the MNE-python suite.
MPI
Message Passing Interface (MPI) is used in high-performance computing (HPC) clusters to facilitate big parallel jobs that utilize multiple compute nodes.
MPI and Slurm
For a tutorial on how to do Slurm reservations for MPI jobs, check out the MPI section of the parallel computing-tutorial.
Installed MPI versions
There are multiple installed MPI versions in the cluster, but due to updates to the underlying network and the operating system some older ones might not be functional.
Therefore it is highly recommended to use the recommended and tested versions of MPI.
Each MPI version will use some underlying compiler by default. Please check here for information on how to change the underlying compiler.
MPI provider |
MPI version |
GCC compiler |
Module name |
Extra notes |
---|---|---|---|---|
OpenMPI |
4.1.5 |
gcc/11.3.0 |
openmpi/4.1.5 |
|
OpenMPI |
4.0.5 |
gcc/8.4.0 |
openmpi/4.0.5 |
There are known issues with this version, we do not recommend using this for new compilations |
Some libraries/programs might have already existing requirement for a certain MPI version. If so, use that version or ask for administrators to create a version of the library with dependency on the MPI version you require.
Warning
Different versions of MPI are not compatible with each other. Each version of MPI will create code that will run correctly with only that version of MPI. Thus if you create code with a certain version, you will need to load the same version of the library when you are running the code.
Also, the MPI libraries are usually linked to slurm and network drivers. Thus, when slurm or driver versions are updated, some older versions of MPI might break. If you’re still using said versions, let us know. If you’re just starting a new project, it is recommended to use our recommended MPI libraries.
Usage
Compiling and running an MPI Hello world-program
The following example uses example codes stored in the hpc-examples-repository. You can get the repository with the following command:
git clone https://github.com/AaltoSciComp/hpc-examples/
Loading module:
module load gcc/11.3.0 # GCC
module load openmpi/4.1.5 # OpenMPI
Compiling the code:
C code is compiled with mpicc
:
cd hpc-examples/hello_mpi/
mpicc -O2 -g hello_mpi.c -o hello_mpi
Fortran code is compiled with mpifort
:
cd hpc-examples/hello_mpi_fortran/ # fortran
mpifort -O2 -g hello_mpi_fortran.f90 -o hello_mpi_fortran # Fortran code
For testing one might be interested in running the program with srun:
srun --time=00:05:00 --mem-per-cpu=200M --ntasks=4 ./hello_mpi
For actual jobs this is obviously not recommended as any problem with the login node can crash the whole MPI job. Thus we’ll want to run the program with a slurm script:
#!/bin/bash
#SBATCH --time=00:05:00 # takes 5 minutes all together
#SBATCH --mem-per-cpu=200M # 200MB per process
#SBATCH --ntasks=4 # 4 processes
module load openmpi/4.1.5 # NOTE: should be the same as you used to compile the code
srun ./hello_mpi
Important
It is important to use srun
when you launch your program.
This allows for the MPI libraries to obtain task placement information
(nodes, number of tasks per node etc.) from the slurm queue.
Overwriting default compiler of an MPI installation
Typically one should use the compiler that the MPI installation has been compiled with. Thus if you encounter a situation where you would like to use a different compiler, it might be best to ask the administrators to install a different version of MPI with a different compiler.
However sometimes one can try to overwrite the default compiler. This will obviously be faster than installing newer MPI versions. However, if you encounter problems after switching the complier, you should not use it.
Changing complier when using OpenMPI
The procedure of changing compilers for OpenMPI is documented in
OpenMPI’s FAQ.
Environment variables such as OMPI_MPICC
and OMPI_MPIFC
can be
set to overwrite the default compiler. See the article for full list
of environment variables.
For example, one could use an
Intel compiler
to compile the Hello world!-example by setting OMPI_MPICC
- and
OMPI_MPIFC
-environment variables.
Intel C compiler is icc
:
module load gcc/11.3.0
module load openmpi/4.1.5
module load intel-oneapi-compilers/2021.4.0
export OMPI_MPICC=icc # Overwrite the C compiler
mpicc -O2 -g hello_mpi.c -o hello_mpi
Intel Fortran compiler is ifort
:
module load gcc/11.3.0
module load openmpi/4.1.5
module load intel-oneapi-compilers/2021.4.0
export OMPI_MPIFC=ifort # Overwrite the Fortran compiler
mpicc -O2 -g hello_mpi.c -o hello_mpi
NVIDIA’s singularity containers
- supportlevel:
A
- pagelastupdated:
2020-05-15
- maintainer:
NVIDIA provides many different docker images containing scientific software through their NGC repository. This software is available for free for NVIDIA’s GPUs and one can register for free to get access to the images.
You can use these images as a starting point for your own GPU images, but do be mindful of NVIDIA’s terms and conditions. If you want to store your own images that are based on NGC images, either use NGC itself or our own Docker registry that is documented on the singularity containers page.
We have converted some of these images with minimal changes into singularity images that are available in Triton.
Currently updated images are:
nvidia-tensorflow
: Contains tensorflow. Due to major changes that happened between Tensorflow v1 and v2, image versions have eithertf1
ortf2
to designate the major version of Tensorflow.
nvidia-pytorch
: Contains PyTorch.
There are various other images available that can be installed very quickly if required.
Running simple Tensorflow/Keras model with NVIDIA’s containers
Let’s run the MNIST example from Tensorflow’s tutorials:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
The full code for the example is in
tensorflow_mnist.py
.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/tensorflow/tensorflow_mnist.py
module load nvidia-tensorflow/20.02-tf1-py3
srun --time=00:15:00 --gres=gpu:1 singularity_wrapper exec python tensorflow_mnist.py
or with sbatch
by submitting
tensorflow_singularity_mnist.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00
module load nvidia-tensorflow/20.02-tf1-py3
singularity_wrapper exec python tensorflow_mnist.py
Do note that by default Keras downloads datasets to $HOME/.keras/datasets
.
Running simple PyTorch model with NVIDIA’s containers
Let’s run the MNIST example from PyTorch’s tutorials:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
The full code for the example is in
pytorch_mnist.py
.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/pytorch/pytorch_mnist.py
module load nvidia-pytorch/20.02-py3
srun --time=00:15:00 --gres=gpu:1 singularity_wrapper exec python pytorch_mnist.py
or with sbatch
by submitting
pytorch_singularity_mnist.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00
module load nvidia-pytorch/20.02-py3
singularity_wrapper exec python pytorch_mnist.py
The Python-script will download the MNIST dataset to data
folder.
Octave
- From Octave’s web page:
GNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also be used as a batch-oriented language.
Octave has extensive tools for solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations. It is easily extensible and customizable via user-defined functions written in Octave’s own language, or using dynamically loaded modules written in C++, C, Fortran, or other languages.
Getting started
Simply load the latest version of Octave.
module load octave
octave
It is best to pick a version of octave and stick with it. Do module
spider octave
and use the whole name:
module load octave/4.4.1-qt-python2
To run octave with the GUI, run it with:
octave --force-gui
Installing packages
Before installing packages you should create a file ~/.octaverc
with the
following content:
package_dir = ['/scratch/work/',getenv('USER'),'/octave'];
eval (["pkg prefix ",package_dir, ";"]);
setenv("CXX","g++ -std=gnu++11")
setenv("DL_LD","g++ -std=gnu++11")
setenv("LD_CXX","g++ -std=gnu++11")
setenv("CC","gcc")
setenv("F77","gfortran")
This sets up /scratch/work/$USER/octave
to be your Octave package directory
and sets gcc
to be your compiler. By setting Octave package directory to
your work directory you won’t run into any quota issues.
After this you should load gcc
- and texinfo
-modules. This gives you an
up-to-date compiler and tools that Octave uses for its documentation:
module load gcc
module load texinfo
Now you can install packages in octave with e.g.:
pkg install -forge -local io
After this you can unload the gcc
- and texinfo
-modules:
module unload gcc
module unload texinfo
OpenFoam
OpenFoam is a popular open source CFD software. There are two main forks of the same software available:
OpenFOAM maintained by OpenCFD (affiliate of ESI Group). Their website is www.openfoam.com and the source code is maintained in their own GitLab repository. They use version numbers based on the year and the month of the release e.g.
1906
.OpenFOAM maintained by OpenFOAM Foundation. Their website is www.openfoam.org and their source code is maintained in various repositories in GitHub. They use integer version numbers e.g.
8
.
There are various installations of these installed in Triton.
OpenFOAM installations
Below is a list of installed OpenFOAM versions:
OpenFOAM provider |
Version |
Module name |
---|---|---|
openfoam.com |
v1906 |
openfoam/1906-openmpi-metis |
openfoam.org |
9 |
openfoam-org/9-openmpi-metis |
openfoam.org |
8 |
openfoam-org/8-openmpi-metis |
openfoam.org |
7 |
openfoam-org/7-openmpi-metis |
Running OpenFOAM
OpenFOAM installations are built using OpenMPI and thus one should reserve the resources following the MPI instructions.
When running the MPI enabled programs, one should launch them
with srun
. This enables SLURM to allocate the tasks correctly.
Some programs included in the OpenFOAM installation (such as
blockMesh
and decomposePar
) do simulation initialization
in a serial fashion and should be called without using srun
.
Examples
Running damBreak example
One popular simple example is an example of a dam breaking in two dimensions. For more information on the example, see this article.
First, we need to take our own copy of the example:
module load openfoam-org/9-openmpi-metis
cp -r $FOAM_TUTORIALS/multiphase/interFoam/laminar/damBreak/damBreak .
Second, we need to write a Slurm script
run_dambreak.sh
:
#!/bin/bash -l
#SBATCH --time=00:05:00
#SBATCH --mem=4G
#SBATCH --ntasks=4
#SBATCH --output=damBreak.out
set -e
module load openfoam-org/9-openmpi-metis
cd damBreak
blockMesh
decomposePar
srun interFoam -parallel
After this we can submit the Slurm script to the queue with
sbatch run_dambreak.sh
. The program will run in the queue and we will get
results in damBreak.out
and in the simulation folder.
Do note that some programs (blockMesh
, decomposePar
) do not
require multiple MPI tasks. Thus these are run without srun
. By
contrast, the program call that does the main simulation
(interFoam -parallel
) uses multiple MPI tasks and thus is called
via srun
.
OpenPose
This uses Singularity containers, so you should refer to that page first for general information.
OpenPose has been compiled against OpenBlas, Caffe, CUDA and cuDNN. Image is based on a nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04 docker image.
Dockerfile for this image is available here.
Within the container OpenPose is installed under /opt/openpose
. Due to
the way the libraries are organized, singularity_wrapper
changes the
working directory to /opt/openpose
.
Running OpenPose example
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/openpose/openpose.sh
module load singularity-openpose
sbatch openpose.sh
Example sbatch script
is
shown below.
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --mem=8G
#SBATCH --gres=gpu:1
module load singularity-openpose/v1.5.1
# Print out usage flags
singularity_wrapper exec openpose --help
# Run example
singularity_wrapper exec openpose --video /opt/openpose/examples/media/video.avi --display 0 --write_video $(pwd)/openpose.avi
ORCA
ORCA is a scientific software that provides cutting-edge methods in the fields of density functional theory and correlated wave-function based methods.
Basic Usage
You can do a simple run with ORCA with the
following script
.
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=2G
#SBATCH --output=orca_example.out
module load orca/4.2.1-openmpi
rm -f water*
cat > water.inp << EOF
!HF
!DEF2-SVP
!PAL4
* xyz 0 1
O 0.0000 0.0000 0.0626
H -0.7920 0.0000 -0.4973
H 0.7920 0.0000 -0.4973
*
EOF
# Parallel runs need the full path to orca executable
# Do not use srun as orca will call mpi independently: https://www.orcasoftware.de/tutorials_orca/first_steps/trouble_install.html#using-openmpi
$(command -v orca) water.inp
This script performs a parallel run of ORCA to simulate the behavior of water
molecule. The input file for this simulation is called water.inp
, which is
written to by the cat command.
To run this script, download it and submit into the queue using sbatch
:
$ wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/orca/orca_example.sh
$ sbatch orca_example.sh
How to launch ORCA when using MPI parallelism
When doing parallel runs you should always launch ORCA with
$(command -v orca) input_file.inp
in your Slurm scripts. This is because ORCA will need the executable to be launched with the full path of the executable and it will launch the MPI tasks independently. For more information, see this documentation page.
Setting the number of MPI tasks
The example given above asked for 4 MPI tasks by setting
#SBATCH --ntasks-per-node=4
in the Slurm batch script and then told
ORCA to use 4 tasks by setting !PAL4
in the input file.
When asking for more than 8 tasks you need use %PAL NPROCS 16 END
to
set the number of tasks in ORCA input (here, the line would specify 16 tasks).
For more information please refer to ORCA’s documentation page on parallel calculations.
Paraview
As a module
A serial version is available on login2. You will need to use the
“forward connection” strategy by using ssh port forwarding. For example,
run ssh -L BBBB:nnnNNN:AAAA username@triton
, where BBBB is the
server you connect to locally and nnnNNN is the node name and AAAA is
the port on that node. See this FAQ question.
See issue #13: https://version.aalto.fi/gitlab/AaltoScienceIT/triton/issues/13 for some user experiences. (Note: the author of this entry is not a paraview expert, suggestions welcome.)
As a container
You can also use paraview via Singularity containers, so you should refer to that page first for general information. It is part of the OpenFoam container.
Python
Python is widely used programming language where we have installed all basic packages on every node. Yet, python develops quite fast and the system provided packages are ofter not complete or getting old.
Python distributions
Python to use |
How to install own packages |
|
---|---|---|
I don’t really care, I just want recent stuff and to not worry. |
Anaconda: |
|
Simple programs with common packages, not switching between Pythons often |
Anaconda: |
|
Your own conda environment |
Miniconda: |
conda environment + conda |
Your own virtual environment |
Module virtualenv |
virtualenv + pip + setuptools |
The main version of modern Python is 3. Support for old Python 2 ended at the end of 2019. There are also different distributions: The “regular” CPython, Anaconda (a package containing CPython + a lot of other scientific software all bundled togeter), PyPy (a just-in-time compiler, which can be much faster for some use cases). Triton supports all of these.
For general scientific/data science use, we suggest that you use Anaconda. It comes with the most common scientific software included, and is reasonably optimized.
There are many other “regular” CPython versions in the module system. These are compiled and optimized for Triton, and are highly recommended. The default system Python is old and won’t be updated.
Make sure your environments are reproducible - you can recreate
them from scratch. History shows you will probably have to do this
eventually, and it also ensures that others can always use your code.
We recommend a minimal requirements.txt
(pip) or
environment.yml
(conda), hand-created with the minimal
dependencies in there.
Quickstart
Use module load anaconda
to get our Python installation.
If you have simple needs, use pip install –user to install packages. For complex needs, use anaconda + conda environments to isolate your projects.
Install your own packages easily
Warning
pip install --user
can result in incompatibilities
If you do this, then the module will be shared among all
your projects. It is quite likely that eventually, you will get some
incompatibilities between the Python you are using and the modules
installed. In that case, you are on your own (simple recommendation is
to remove all modules from ~/.local/lib/pythonN.N
and reinstall). If
you get incompatible module errors, our first recommendation will be to
remove everything installed this way and use conda/virtual
environments instead. It’s not a bad idea to do this when you
switch to environments anyway.
If you encounter problems, remove all your user packages:
$ rm -r ~/.local/lib/python*.*/
and reinstall everything after loading the environment you want.
Installing your own packages with pip install
won’t work, since it
tries to install globally for all users. Instead, you should do this
(add --user
) to install the package in your home directory
(~/.local/lib/pythonN.N/
):
$ pip install --user $package_name
This is quick and effective best used for leaf packages without many dependencies and if you don’t switch Python modules often.
Note
Example of dangers of pip install --user
Someone did pip install --user tensorflow
. Some time later,
they noticed that they couldn’t use Tensorflow + GPUs. We couldn’t
reproduce the problem, but in the end found they had this local
install that was hiding any Tensorflow in any module (forcing a CPU
version on them).
Note: pip
installs from the Python Package Index.
Anaconda and conda environments
Anaconda is a Python distribution by Continuum Analytics (open source, of course). It is nothing fancy, they just take a lot of useful scientific packages and their dependencies and put them all together, make sure they work, and do some optimization. They also include most of the most common computing and data science packages and non-Python compiled software and libraries. It is also all open source, and is packaged nicely so that it can easily be installed on any major OS.
To load anaconda, use the module system (you can also load specific versions):
$ module load anaconda # python3
$ module load anaconda2 # python2
Note
Before 2020, Python3 was via the anaconda3
module (note the
3
on the end). That’s still there, but in 2020 we completely
revised our Anaconda installation system, and dropped active
maintenance of Python 2. All updates are in anaconda
only in
the future.
Conda environments
See also
Watch a Research Software Hour episode on conda for an introduction + demo.
If you encounter a situation where you need to create your own environment, we recommend that you use conda environments. When you create your own environment the packages from the base environment (default environment installed by us) will not be used, but you can choose which packages you want to install.
We nowadays recommend that you use the miniconda
-module for installing these
environments. Miniconda is basically a minimal Anaconda installation that can be used to
create your own environments.
By default conda tries to install packages into your home folder, which can result in running out of quota. To fix this, you should run the following commands once:
$ module load miniconda
$ mkdir $WRKDIR/.conda_pkgs
$ mkdir $WRKDIR/.conda_envs
$ conda config --append pkgs_dirs ~/.conda/pkgs
$ conda config --append envs_dirs ~/.conda/envs
$ conda config --prepend pkgs_dirs $WRKDIR/.conda_pkgs
$ conda config --prepend envs_dirs $WRKDIR/.conda_envs
virtualenv does not work with Anaconda, use conda
instead.
Load the miniconda module. You should look up the version and use load same version each time you source the environment:
## Load miniconda first. This must always be done before activating the env! $ module load miniconda
Create an environment. This needs to be done once:
## create environment with the packages you require $ conda create -n ENV_NAME python pip ipython tensorflow-gpu pandas ...
Activate the environment. This needs to be done every time you load the environment:
## This must be run in each shell to set up the environment variables properly. ## make sure module is loaded first. $ source activate ENV_NAME
Activating and using the environment, installing more packages, etc. can be done either using
conda install
orpip install
:## Install more packages, either conda or pip $ conda search PACKAGE_NAME $ conda install PACKAGE_NAME $ pip install PACKAGE_NAME
Leaving the environment when done (optional):
## Deactivate the environment $ source deactivate
To activate an environment from a Slurm script:
#!/bin/bash #SBATCH --time=00:05:00 #SBATCH --cpus_per_task=1 #SBATCH --mem=1G source activate ENV_NAME srun echo "This step is ran inside the activated conda environment!" source deactivate
Worst case, you have incompatibility problems. Remove everything, including the stuff installed with
pip install --user
. If you’ve mixed your personal stuff in with this, then you will have to separate it out.:## Remove anything installed with pip install --user. $ rm -r ~/.local/lib/python*.*/
A few notes about conda environments:
Once you use a conda environment, everything goes into it. Don’t mix versions with, for example, local packages in your home dir and
--pip install --user
. Things installed (even previously) withpip install --user
will be visible in the conda environment and can make your life hard! Eventually you’ll get dependency problems.Often the same goes for other python based modules. We have setup many modules that do use anaconda as a backend. So, if you know what you are doing this might work.
conda init
, conda activate
, and source activate
We don’t recommend doing conda init
like many sources
recommend: this will permanently affect your .bashrc
file and
make hard-to-debug problems later. The main points of conda
init
are to a) automatically activate an environment (not good on
a cluster: make it explicit so it can be more easily debugged)
and b) make conda
a shell function (not command) so that
conda activate
will work (source activate
works as well in
all cases, no confusion if others don’t.)
If you activate one environment from another, for example after loading an anaconda module, do
source activate ENV_NAME
like shown above (conda installation in the environment not needed).If you make your own standalone conda environments, install the
conda
package in them, then…Activate a standalone environment with conda installed in it by
source PATH/TO/ENV_DIRECTORY/bin/activate
(which incidentally activates just that one session for conda).
Python: virtualenv
Virtualenv is default-Python way of making environments, but does
not work with Anaconda. We generally recommend using anaconda,
since it includes a lot more stuff by default, but virtualenv
works on other systems easily so it’s good to know about.
## Load module python
$ module load py-virtualenv
## Create environment
$ virtualenv DIR
## activate it (in each shell that uses it)
$ source DIR/bin/activate
## install more things (e.g. ipython, etc.)
$ pip install PACKAGE_NAME
## deactivate the virtualenv
$ deactivate
Anaconda/virtualenvironments in Jupyter
If you make a conda environment / virtual environment, you can use it from Triton’s JupyterHub (or your own Jupyter). See Installing kernels from virtualenvs or Anaconda environments.
IPython Parallel
ipyparallel is a tool for running embarrassingly parallel code using Python. The basic idea is that you have a controller and engines. You have a client process which is actually running your own code.
Preliminary notes: ipyparallel is installed in the anaconda{2,3}/latest modules.
Let’s say that you are doing some basic interactive work:
Controller: this can run on the frontend node, or you can put it on a script. To start:
ipcontroller --ip="*"
Engines:
srun -N4 ipengine
: This runs the four engines in slurm interactively. You don’t need to interact with this once it is running, but remember to stop the process once it is done because it is using resources. You can start/stop this as needed.Start your Python process and use things like normal:
import os import ipyparallel client = ipyparallel.Client() result = client[:].apply_async(os.getpid) pid_map = result.get_dict() print(pid_map)
This method lets you turn on/off the engines as needed. This isn’t the most advanced way to use ipyparallel, but works for interactive use.
See also: IPython parallel for a version which goes in a slurm script.
Background: pip
vs python
vs anaconda
vs conda
vs virtualenv
Virtual environments are self-contained python environments with all of their own modules, separate from the system packages. They are great for research where you need to be agile and install whatever versions and packages you need. We highly recommend virtual environments or conda environments (below)
Anaconda: use conda, see below
Normal Python: virtualenv + pip install, see below
You often need to install your own packages. Python has its own package manager system that can do this for you. There are three important related concepts:
pip: the Python package installer. Installs Python packages globally, in a user’s directory (
--user
), or anywhere. Installs from the Python Package Index.virtualenv: Creates a directory that has all self-contained packages that is manageable by the user themself. When the virtualenv is activated, all the operating-system global packages are no longer used. Instead, you install only the packages you want. This is important if you need to install specific versions of software, and also provides isolation from the rest of the system (so that you work can be uninterrupted). It also allows different projects to have different versions of things installed. virtualenv isn’t magic, it could almost be seen as just manipulating
PYTHONPATH
,PATH
, and the like. Docs: https://docs.python-guide.org/dev/virtualenvs/conda: Sort of a combination of package manager and virtual environment. However, it only installed packages into environments, and is not limited to Python packages. It can also install other libraries (c, fortran, etc) into the environment. This is extremely useful for scientific computing, and the reason it was created. Docs for envs: https://conda.io/projects/conda/en/latest/user-guide/concepts/environments.html.
So, to install packages, there is pip
and conda
. To make virtual
environments, there is venv
and conda
.
Advanced users can see this rosetta stone for reference.
On Triton we have added some packages on top of the Anaconda installation, so cloning the entire Anaconda environment to local conda environment will not work (not a good idea in the first place but some users try this every now and then).
Examples
Running Python with OpenMP parallelization
Various Python packages such as Numpy, Scipy and pandas can utilize OpenMP
to run on multiple CPUs. As an example, let’s run the python script
python_openmp.py
that calculates multiplicative inverse of five symmetric matrices of
size 2000x2000.
nrounds = 5
t_start = time()
for i in range(nrounds):
a = np.random.random([2000,2000])
a = a + a.T
b = np.linalg.pinv(a)
t_delta = time() - t_start
print('Seconds taken to invert %d symmetric 2000x2000 matrices: %f' % (nrounds, t_delta))
The full code for the example is in
HPC examples-repository.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/hpc-examples/master/python/python_openmp/python_openmp.py
module load anaconda/2022-01
export OMP_PROC_BIND=true
srun --cpus-per-task=2 --mem=2G --time=00:15:00 python python_openmp.py
or with sbatch
by submitting
python_openmp.sh
:
#!/bin/bash -l
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1G
#SBATCH -o python_openmp.out
module load anaconda/2022-01
export OMP_PROC_BIND=true
echo 'Running on: '$HOSTNAME
srun python python_openmp.py
Important
Python has a global interpreter lock (GIL), which forces some operations to be executed on only one thread and when these operations are occuring, other threads will be idle. These kinds of operations include reading files and doing print statements. Thus one should be extra careful with multithreaded code as it is easy to create seemingly parallel code that does not actually utilize multiple CPUs.
There are ways to minimize effects of GIL on your Python code and if you’re creating your own multithreaded code, we recommend that you take this into account.
Running MPI parallelized Python with mpi4py
MPI parallelized Python requires a valid MPI installation that support our SLURM scheduler. Thus anaconda is not the best option. We have installed MPI-supporting Python versions to different toolchains.
Using mpi4py is quite easy. Example is provided below.
Python MPI4py
A simple script mpi4py.py
that utilizes mpi4py.
#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sys.stdout.write(
"Hello, World! I am process %d of %d on %s.\n"
% (rank, size, name))
Running mpi4py.py using only srun:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4
module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py
Example sbatch script mpi4py.sh
when running mpi4py.py through
sbatch:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4
module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py
Python Environments with Conda
Conda is a popular package manager that is especially popular in data science and machine learning communities.
It is commonly used to handle complex requirements of Python and R packages.
Quick usage guide
First time setup
You can get conda by loading the miniconda
-module:
$ module load miniconda
By default Conda stores installed packages and environments in your home directory. However, as your home directory has a lower quota, it is a good idea to tell conda to install packages and environments into your work directory:
$ mkdir $WRKDIR/.conda_pkgs
$ mkdir $WRKDIR/.conda_envs
$ conda config --append pkgs_dirs ~/.conda/pkgs
$ conda config --append envs_dirs ~/.conda/envs
$ conda config --prepend pkgs_dirs $WRKDIR/.conda_pkgs
$ conda config --prepend envs_dirs $WRKDIR/.conda_envs
Now you’re all set up to create your first environment.
Creating a simple environment with conda
One can install environments from the command line itself, but a better idea
is to write an environment.yml
-file that describes the environment.
Below we have a simple environment.yml
:
name: conda-example
channels:
- conda-forge
dependencies:
- numpy
- pandas
Now we can use the conda
-command to create the environment:
$ module load miniconda
$ conda env create --file environment.yml
Once the environment is installed, you can activate it with:
$ source activate conda-example
conda init
, conda activate
, and source activate
We don’t recommend doing conda init
like many sources
recommend: this will permanently affect your .bashrc
file and
make hard-to-debug problems later. The main points of conda
init
are to a) automatically activate an environment (not good on
a cluster: make it explicit so it can be more easily debugged)
and b) make conda
a shell function (not command) so that
conda activate
will work (source activate
works as well in
all cases, no confusion if others don’t.)
If you activate one environment from another, for example after loading an anaconda module, do
source activate ENV_NAME
like shown above (conda installation in the environment not needed).If you make your own standalone conda environments, install the
conda
package in them, then…Activate a standalone environment with conda installed in it by
source PATH/TO/ENV_DIRECTORY/bin/activate
(which incidentally activates just that one session for conda).
Resetting conda
Sometimes it is necessary to reset your Conda configuration. So here are instructions on how to wipe all of your conda settings and existing environments. To be able to do so first activate conda. On Triton, by loading the miniconda environment:
$ module load miniconda
First, check where conda stores your environments:
$ conda config --show envs_dirs
$ conda config --show pkgs_dirs
Delete the directories that are listed and start with /home/USERNAME
(this could e.g. be /home/<username>/.conda/envs
)
and /scratch/
( e.g. /scratch/work/USERNAME/conda_envs
). You would delete
these with rm -r DIRNAME
, but be careful you use the right paths because there
is no going back.
This will clean up all packages and environments you have installed.
Next, clean up your .bashrc
, .zshrc
, .kshrc
and .cshrc
(whichever ones exist for you).
Open these files in an editor (e.g. nano .bashrc
) and search for the line # >>> conda initialize >>>
delete everything between this line and the line # <<< conda initialize <<<
. These lines automatically
initilize conda upon login which can cause a lot of trouble on a cluster.
Finally delete the file .condarc
from your home folder ( rm ~/.condarc
) to reset your conda configuration.
After this close the current connection to triton and reconnect in a new session.
Now you should have a system that doesn’t have any remains of conda, so you can now follow the initial steps as detailed here.
Understanding the environment file
Conda environment files are written using YAML syntax. In an environment file one usually defines the following:
name
: Name of the desired environment.channels
: Which channels to use for packages.dependencies
: Which conda and pip packages to install.
Choosing conda channels
When an environment file is used to create an environment, conda looks up the list of channels (in descending priority) and it will try to find the needed packages.
Some of the most popular channels are:
conda-forge
: An open-source channel with over 18k packages. Highly recommended for new environments. Most packages inanaconda
-modules come from here.defaults
: A channel maintained by Anaconda Inc.. Free for non-commercial use. Default for anaconda distribution.r
: A channel of R packages maintained by Anaconda Inc.. Free for non-commercial use.bioconda
: A community maintained channel of bioinformatics packages.pytorch
: Official channel for PyTorch, a popular machine learning framework.
One can have multiple channels defined like in the following example:
name: pytorch-env
channels:
- nvidia
- pytorch
- conda-forge
dependencies:
- pytorch
- pytorch-cuda=12.1
- torchvision
- torchaudio
Setting package dependencies
Packages in environment.yml
can have version constraints and version
wildcards. One can also specify pip packages to install after conda-packages
have been installed.
For example, the following
dependency-env.yml
would install a numpy with version higher or equal
than 1.10 using conda and scipy via pip:
name: dependency-env
channels:
- conda-forge
dependencies:
- numpy>=1.10.*
- pip
- pip:
- scipy
Listing packages in an environment
To list packages installed in an environment, one can use:
$ conda list
Removing an environment
To remove an environment, one can use:
$ conda env remove --name environment_name
Do remember to deactivate the environment before trying to remove it.
Cleaning up conda cache
Conda uses a cache for downloaded and installed packages. This cache can get large or it can be corrupted by failed downloads.
In these situations one can use conda clean
to clean up the cache.
conda clean -i
cleans up the index cache that conda uses to find the packages.conda clean -t
cleans up downloaded package installers.conda clean -p
cleans up unused packages.conda clean -a
cleans up all of the above.
Installing new packages into an environment
Installing new packages into an existing environment can be done with
conda install
-command. The following command would install matplotlib
from conda-forge
into an environment.
$ conda install --freeze-installed --channel conda-forge matplotlib
Installing packages into an existing environment can be risky: conda uses channels given from the command line when it determines which channels it should use for the new packages.
This can cause a situation where installing a new package results in the
removal and reinstallation of multiple packages. Adding the
--freeze-installed
-flags makes already installed packages safe and by
giving explicitly the channels to use, one can make certain that the new
packages come from the same source.
It is usually a better option to create a new environment with the new
package set as an additional dependency in the environment.yml
.
This keeps the environment reproducible.
If you intend on installing packages to existing environment, adding default channels for the environment can also make installing packages easier.
Setting default channels for an environment
It is a good idea to store channels used when creating the environment into a configuration file that is stored within the environment. This makes it easier to install any missing packages.
For example, one could add conda-forge
into the list of default channels
with:
$ conda config --env --add channels conda-forge
We can check the contents of the configuration file with:
$ cat $CONDA_PREFIX/.condarc
Doing everything faster with mamba
mamba is a drop-in replacement for conda that does environment building and solving much faster than conda.
To use it, you either need to install mamba
-package from
conda-forge
-channel or use the miniconda
-module.
If you have mamba
, you can just switch from using conda
-command
to using mamba
and it should work in the same way, but faster.
For example, one could create an environment with:
$ mamba env create --file environment.yml
Motivation for using conda
When should you use conda?
If you need basic Python packages, you can use pre-installed
anaconda
-modules. See the Python-page for
more information.
You should use conda when you need to create your own custom environment.
Why use conda? What are its advantages?
Quite often Python packages are installed with Pip from the Python Package Index (PyPI). These packages contain Python code and in many cases some compiled code as well.
However, there are three problems pip cannot solve without additional tools:
How do you install multiple separate suites of packages for different use cases?
How do you handle packages that depend on some external libraries?
How do you make sure that all of the packages have are compatible with each other?
Conda tries to solve these problems with the following ways:
Conda creates environments where packages are installed. Each environment can be activated separately.
Conda installs library dependencies to the environment with the Python packages.
Conda uses a solver engine to figure out whether packages are compatible with each other.
Conda also caches installed packages so doing copies of similar environments does not use additional space.
One can also use the environment files to make the installation procedure more reproducible.
Creating an environment with CUDA toolkit
NVIDIA’s CUDA-toolkit is needed for working with NVIDIA’s GPUs. Many Python frameworks that work on GPUs need to have a supported CUDA toolkit installed.
Conda is often used to provide the CUDA toolkit and additional libraries such as cuDNN. However, one should choose the version of the CUDA toolkit based on what the software requires.
If the package is installed from a conda channel such as conda-forge
,
conda will automatically retreive the correct version of CUDA toolkit.
In other cases one can use an environment file like this
cuda-env.yml
:
name: cuda-env
channels:
- conda-forge
dependencies:
- cudatoolkit
Hint
During installation conda will try to verify what is the maximum version of CUDA installed graphics cards can support and it will install non-CUDA enabled versions by default if none are found (as is the case on the login node, where environments are normally built). This can be usually overcome by setting explicitly that the packages should be the CUDA-enabled ones. It might however happen, that the environment creation process aborts with a message similar to:
nothing provides __cuda needed by tensorflow-2.9.1-cuda112py310he87a039_0
In this instance it might be necessary to override the CUDA settings used by
conda/mamba.
To do this, prefix your environment creation command with CONDA_OVERRIDE_CUDA=CUDAVERSION
,
where CUDAVERSION is the CUDA toolkit version you intend to use as in:
CONDA_OVERRIDE_CUDA="11.2" mamba env create -f cuda-env.yml
This will allow conda to assume that the respective CUDA libraries will be present at a later point and so it will skip those requirements during installation.
For more information, see this helpful post in Conda-Forge’s documentation.
Creating an environment with GPU enabled Tensorflow
To create an environment with GPU enabled Tensorflow you can use an
environment file like this
tensorflow-env.yml
:
name: tensorflow-env
channels:
- conda-forge
dependencies:
- tensorflow=*=*cuda*
Here we install the latest tensorflow from conda-forge
-channel with an additional
requirement that the build version of the tensorflow
-package must contain
a reference to a CUDA toolkit. For a specific version replace the =*=*cuda*
with e.g. =2.8.1=*cuda*
for version 2.8.1
.
If you encounter errors related to CUDA while creating the environment, do note this hint on overriding CUDA during installation.
Creating an environment with GPU enabled PyTorch
To create an environment with GPU enabled PyTorch you can use an
environment file like this
pytorch-env.yml
:
name: pytorch-env
channels:
- nvidia
- pytorch
- conda-forge
dependencies:
- pytorch
- pytorch-cuda=12.1
- torchvision
- torchaudio
Here we install the latest pytorch version from pytorch
-channel and
the pytorch-cuda
-metapackage that makes certain that the
Additional packages required by pytorch
are installed from conda-forge
-channel.
If you encounter errors related to CUDA while creating the environment, do note this hint on overriding CUDA during installation.
Installing numpy with Intel MKL enabled BLAS
NumPy and other mathematical libaries utilize BLAS (Basic Linear Algebra Subprograms) implementation for speeding up many operations. Intel provides their own fast BLAS implementation in Intel MKL (Math Kernel Library). When using Intel CPUs, this library can give a significant performance boost to mathematical calculations.
One can install this library as the default BLAS by specifying
blas * mkl
as a requirement in the dependencies like in this
mkl-env.yml
:
name: mkl-env
channels:
- conda-forge
dependencies:
- blas * mkl
- numpy
Advanced usage
Finding available packages
Because conda tries to make certain that all packages in an environment are compatible with each other, there are usually tens of different versions of a single package.
One can search for a package from a channel with the following command:
$ mamba search --channel conda-forge tensorflow
This will return a long list of packages where each line looks something like this:
tensorflow 2.8.1 cuda112py39h01bd6f0_0 conda-forge
Here we have:
The package name (
tensorflow
).Version of the package (
2.8.1
).Package build version. This version often contains information on:
Python version needed by the package (
py39
or Python 3.9).Other libraries used by the package (
cuda112
or CUDA 11.2).
Channel where the package comes from (
conda-forge
).
Checking package dependencies
One can check package dependencies by adding the --info
-flag to the
search command. This can give a lot of output, so it is a good idea to
limit the search to one specific package:
$ mamba search --info --channel conda-forge tensorflow=2.8.1=cuda112py39h01bd6f0_0
The output looks something like this:
tensorflow 2.8.1 cuda112py39h01bd6f0_0
--------------------------------------
file name : tensorflow-2.8.1-cuda112py39h01bd6f0_0.tar.bz2
name : tensorflow
version : 2.8.1
build : cuda112py39h01bd6f0_0
build number: 0
size : 26 KB
license : Apache-2.0
subdir : linux-64
url : https://conda.anaconda.org/conda-forge/linux-64/tensorflow-2.8.1-cuda112py39h01bd6f0_0.tar.bz2
md5 : 35716504c8ce6f685ae66a1d9b084fc7
timestamp : 2022-05-21 09:09:53 UTC
dependencies:
- __cuda
- python >=3.9,<3.10.0a0
- python_abi 3.9.* *_cp39
- tensorflow-base 2.8.1 cuda112py39he716a45_0
- tensorflow-estimator 2.8.1 cuda112py39hd320b7a_0
Packages with underscores are meta-packages that should not be added to conda environment specifications. They will be solved by conda automatically.
Here we can see more info on the package, including its dependencies.
When using mamba, one can also use mamba repoquery depends
to
see the dependencies:
$ mamba repoquery depends --channel conda-forge tensorflow=2.8.1=cuda112py39h01bd6f0_0
Output looks something like this:
Name Version Build Channel
─────────────────────────────────────────────────────────────────────────────
tensorflow 2.8.1 cuda112py39h01bd6f0_0 conda-forge/linux-64
__cuda >>> NOT FOUND <<<
python 3.9.9 h62f1059_0_cpython conda-forge/linux-64
python_abi 3.9 2_cp39 conda-forge/linux-64
tensorflow-base 2.8.1 cuda112py39he716a45_0 conda-forge/linux-64
tensorflow-estimator 2.8.1 cuda112py39hd320b7a_0 conda-forge/linux-64
One can also print the full dependency list with
mamba repoquery depends --tree
. This will produce a really long output.
$ mamba repoquery depends --channel conda-forge tensorflow=2.8.1=cuda112py39h01bd6f0_0
Fixing conflicts between packages
Usually first step of fixing conflicts between packages is to write a new environment file and list all required packages in the file as dependencies. A fresh solve of the environment can often result in a working environment.
Sometimes there is a case where a single package does not have support for a specific version of Python or specific version of CUDA toolkit. In these cases it is usually beneficial to give more flexibility to the solver by limiting the number of specified versions.
One can also use the search commands provided by mamba
to see what
dependencies individual packages have.
PyTorch
- pagelastupdated:
2022-08-08
PyTorch is a commonly used Python package for deep learning.
Basic usage
First, check the tutorials up to and including GPU computing.
If you plan on using NVIDIA’s containers to run your model, please check the page about NVIDIA’s singularity containers.
The basic way to use PyTorch is via the Python in the anaconda
module.
Don’t load any additional CUDA modules, anaconda
includes everything.
Building your own environment with PyTorch
If you need a PyTorch version different to the one supplied with anaconda we recommend installing your own anaconda environment as detailed here.
Creating an environment with GPU enabled PyTorch
To create an environment with GPU enabled PyTorch you can use an
environment file like this
pytorch-env.yml
:
name: pytorch-env
channels:
- nvidia
- pytorch
- conda-forge
dependencies:
- pytorch
- pytorch-cuda=12.1
- torchvision
- torchaudio
Here we install the latest pytorch version from pytorch
-channel and
the pytorch-cuda
-metapackage that makes certain that the
Additional packages required by pytorch
are installed from conda-forge
-channel.
Hint
During installation conda will try to verify what is the maximum version of CUDA installed graphics cards can support and it will install non-CUDA enabled versions by default if none are found (as is the case on the login node, where environments are normally built). This can be usually overcome by setting explicitly that the packages should be the CUDA-enabled ones. It might however happen, that the environment creation process aborts with a message similar to:
nothing provides __cuda needed by tensorflow-2.9.1-cuda112py310he87a039_0
In this instance it might be necessary to override the CUDA settings used by
conda/mamba.
To do this, prefix your environment creation command with CONDA_OVERRIDE_CUDA=CUDAVERSION
,
where CUDAVERSION is the CUDA toolkit version you intend to use as in:
CONDA_OVERRIDE_CUDA="11.2" mamba env create -f cuda-env.yml
This will allow conda to assume that the respective CUDA libraries will be present at a later point and so it will skip those requirements during installation.
For more information, see this helpful post in Conda-Forge’s documentation.
Examples:
Simple PyTorch model
Let’s run the MNIST example from PyTorch’s tutorials:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
The full code for the example is in
pytorch_mnist.py
.
One can run this example with srun
:
$ wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/pytorch/pytorch_mnist.py
$ module load anaconda
$ srun --time=00:15:00 --gres=gpu:1 python pytorch_mnist.py
or with sbatch
by submitting
pytorch_mnist.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00
module load anaconda
python pytorch_mnist.py
The Python-script will download the MNIST dataset to data
folder.
Running simple PyTorch model with NVIDIA’s containers
Let’s run the MNIST example from PyTorch’s tutorials:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4*4*50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4*4*50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
The full code for the example is in
pytorch_mnist.py
.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/pytorch/pytorch_mnist.py
module load nvidia-pytorch/20.02-py3
srun --time=00:15:00 --gres=gpu:1 singularity_wrapper exec python pytorch_mnist.py
or with sbatch
by submitting
pytorch_singularity_mnist.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00
module load nvidia-pytorch/20.02-py3
singularity_wrapper exec python pytorch_mnist.py
The Python-script will download the MNIST dataset to data
folder.
R
R is a language and environment for statistical computing and graphics with wide userbase. There exists several packages that are easily imported to R.
Getting started
Simply load the latest R.
module load r
R
As any packages you install against R are specific to the version you
installed them with, it is best to pick a version of R and stick with it.
You can do this by checking the R version with module spider r
and
using the whole name when loading the module:
module load r/3.6.1-python3
If you want to detect the number of cores, you should use the proper Slurm environment variables (defaulting to all cores):
library(parallel)
as.integer(Sys.getenv('SLURM_CPUS_PER_TASK', parallel::detectCores()))
Installing packages
There are two ways to install packages.
You can usually install packages yourself, which allows you to keep up to date and reinstall as needed. Good instructions can be found here, for example:
R > install.packages('L1pack')
This should guide you to selecting a download mirror and offer you the option to install in your home directory.
If you have a lot of packages, you can run out of home quota. In this case you should move your package directory to your work directory and replace it the
~/R
-directory with a symlink that points to your$WRKDIR/R
.Example of doing this is here:
mv ~/R $WRKDIR/R ln -s $WRKDIR/R ~/R
More info on R library paths can be found here. Looking at R startup can also be informative.
You can also put a request to the triton issue tracker and mention which R-version you are using.
Simple R serial job
Serial R example
#!/bin/bash -l
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --mem=100M
#SBATCH --output=r_serial.out
module load r
n=3
m=2
srun Rscript --vanilla r_serial.R $n $m
args = commandArgs(trailingOnly=TRUE)
n<-as.numeric(args[1])
m<-as.numeric(args[2])
print(n)
print(m)
A<-t(matrix(0:5,ncol=n,nrow=m))
print(A)
B<-t(matrix(2:7,ncol=n,nrow=m))
print(B)
C<-matrix(0.5,ncol=n,nrow=n)
print(C)
C<-A %*% t(B) + 2*C
print(C)
Simple R job using OpenMP for parallelization
R OpenMP Example
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_openmp.out
module load r
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
time srun Rscript --default-packages=methods,utils,stats R-benchmark-25.R
The benchmark script is available here (more information about it is available here page).
Simple R parallel job using ‘parallel’-package
Parallel R example
#!/bin/bash
#SBATCH --time=00:20:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_parallel.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Load the version of R you want to use
module load r
# Run your R script
srun Rscript r_parallel.R
library(pracma)
library(parallel)
invertRandom <- function(index) {
A<-matrix(runif(2000*2000),ncol=2000,nrow=2000);
A<-A + t(A);
B<-pinv(A);
return(max(B %*% A));
}
ptm<-proc.time()
mclapply(1:16,invertRandom, mc.cores=Sys.getenv('SLURM_CPUS_PER_TASK'))
proc.time()-ptm
When constrained to opt-architecture, run times for different core numbers were
ncores |
1 |
2 |
4 |
8 |
---|---|---|---|---|
runtime |
380.757 |
182.185 |
125.526 |
84.230 |
RStan
- supportlevel:
B
- pagelastupdated:
2018-07-26
- maintainer:
RStan is an R interface to Stan. Stan is a platform for modeling.
Basic installation
RStan is installed as an R package and there is nothing too special about it.
First, load the R module you need to use. There are different
options, using different compilers. Do not use an iomkl
R
version, because it requires the intel compilers to work on the nodes
to compile every time you run, and they aren’t available there. If
you load a goolf
R version, it will work (you could work around
this by pre-compiling models, if you wanted):
$ module spider R
...
R/3.4.1-goolf-triton-2017a
R/3.4.1-iomkl-triton-2017a
$ module load R/3.4.1-goolf-triton-2017a
If you change R versions (from intel to gcc) or get errors about
loading libraries, you may have installed incompatible libraries.
Removing your ~/R
directory and reinstalling all of your libraries
is a good first place to start.
Notes
You should detect the number of cores with:
as.integer(Sys.getenv('SLURM_JOB_CPUS_PER_NODE', parallel::detectCores()))
Common Rstan problems
Models must be compiled on the machine that is running them, Triton or other workstations. The compiled model files aren’t necessarily portable, since they depend on the libraries available when build. One symptom of this problem is error messages which talk about loading libraries and
GLIBC_2.23
or some such.In order to compile models, you must have the compiler available on the nodes. Thus, the Intel compilers (
iomkl
) won’t work. It also won’t work if the Intel compiler license servers are down. Using the GNU compiler toolchains are more reliable.
Example
RStudio
- supportlevel:
C
- pagelastupdated:
2014
https://www.rstudio.com/ is an IDE for R
module load R/3.1.1-openblas boost/1.56 cmake/2.8.12.2 gcc/4.9.1 PrgEnv-gnu/0.1 qt/4.8.6
mkdir build && cd build
cmake .. -DRSTUDIO_TARGET=Desktop -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/share/apps/rstudio/0.98/ -DBOOST_ROOT=$BOOST_ROOT
Siesta & Transiesta
Copy-pasted Makefiles from Rocks. Should be used as a starting point. If you have a fully working version for SL6.2, send us a copy please.
See old wiki: https://wiki.aalto.fi/display/Triton/Applications
Rename siesta-3.0.arch.make.xxx => siesta-3.0-b/Obj/arch.make
Your own notebooks on Triton via sjupyter
Note
Now that Triton Jupyterhub exists, this method of running Jupyter is not so important. It is only needed if you need more resources than JupyterHub can provide.
We provide a command sjupyter
which automates launching your own
notebooks in the Slurm queue. To use this, module load sjupyter
.
This gives you more flexibility in
choosing your nodes and resources than Jupyterhub, but also will after
your and your department’s Triton priority more because you are
blocking others from using these resources.
Set up the proxy
When running Jupyter on another system, the biggest problem is always making the conenction securely. To do this here, we use a browser extension and SSH Proxy.
Install the proxy extension
Install the extension FoxyProxy Standard (Firefox or Chrome). Some versions do not work properly: the 5.x series for Firefox may not work, but older and newer does.
Create a new proxy rule with the pattern
*int.triton.aalto.fi*
(orjupyter.triton.aalto.fi
if you want to connect to that using the proxy).Proxy type: SOCKS5, Proxy URL:
localhost
, port8123
.DNS through the proxy: on.
SSH to triton and use the
-D 8123
. This starts a proxy on your computer on port 8123. This has to always be running whenever you connect to the notebook.If you are in Aalto networks:
ssh -D 8123 USERNAME@triton.aalto.fi
.If you are not in Aalto networks, you need to do an extra hop through another Aalto server:
ssh -D 8123 -J USERNAME@kosh.aalto.fi USERNAME@triton.aalto.fi
.
Now, when you go to any address matching *.int.triton.aalto.fi*
,
you will automatically connect to the right place on Triton. You
can use Jupyter like normal. But if the ssh connection goes down,
then you can’t connect and will get errors, so be aware (especially
with jupyter.triton.aalto.fi which you might expect to always work).
Starting sjupyter
We have the custom-built command sjupyter
for
starting Jupyter on Triton.
First, you must load the sjupyter
module:
module load sjupyter
To run in the Triton queue (using more resources), just use
sjupyter
. This will start a notebook on the interactive Slurm
queue. All the normal rules apply: timelimits, memory limits, etc.
If you want to request more resources, use the normal Slurm options
such as -t
, --mem
, etc. Notebooks can only last as long as
your job lasts, and you will need to restart them. Be efficient with
resource usage: if you request a lot of resources and leave the
notebook idle, no one else can use them. Thus, try to use the
(default) interactive partition, which handles this automatically.
To run on the login node, run sjupyter --local
. This is good for
small testing and so on, which doesn’t use too much CPU or memory.
speech2text: easy speech transcription
speech2text is a wrapper we have made around the Whisper tool to make it easier to run for a wide audience. Fundamentally, it’s a wrapper around the command line tool + set of instructions for transferring data in a way that (hopefully) can’t go too wrong.
You can read the instructions here: https://aaltorse.github.io/speech2text/
If you use speech2text, you are using Triton and any outputs (papers, thesis, conference publications) should acknowledge Triton, and link it to the “Science-IT” infrastructure in ACRIS once it is published. You might get an email each year reminding you to do this.
Spyder
Spyder is the Scientific PYthon Development EnviRonment:https://www.spyder-ide.org/
On triton there are two modules that provide Spyder:
- The basic anaconda module: module load anaconda
or
- The neuroimaging environment module: module load neuroimaging
By loading either module you will get access to Spyder.
Using Spyder on Triton
To use spyder on triton, you will need an xserver on your local machine
(in order to display the spyder GUI) e.g. VcXsrv.
You will further need to connect to triton with X-Forwarding:
ssh -X triton.aalto.fi
Finally, load the module you want to use Spyder from (see above) and run spyder
Use a different environment for Spyder
If you want to use python packages which are not part of the module you use spyder from, it is strongly to suggested to create a virtual environment (e.g. e.g. Conda environments). Set up the environment with all packages you want to use. After that, the following steps will make spyder use the environment:
Activate your environment
Run
python -c "import sys; print(sys.executable)
to get the path to the python interpreter in your environmentDeactivate the environment
Start Spyder
In spyder Navigate to “Tools -> Preferences” and select “Python interpreter”. Under “Use the following Python Interpreter” enter the path from step 2
That will make Spyder use the created python environment.
Tensorflow
- pagelastupdated:
2022-01-09
Tensorflow is a commonly used Python package for deep learning.
Basic usage
First, check the tutorials up to and including GPU computing.
Installing via conda
Have a look here for details on how to install conda environments.
Creating an environment with GPU enabled Tensorflow
To create an environment with GPU enabled Tensorflow you can use an
environment file like this
tensorflow-env.yml
:
name: tensorflow-env
channels:
- conda-forge
dependencies:
- tensorflow=*=*cuda*
Here we install the latest tensorflow from conda-forge
-channel with an additional
requirement that the build version of the tensorflow
-package must contain
a reference to a CUDA toolkit. For a specific version replace the =*=*cuda*
with e.g. =2.8.1=*cuda*
for version 2.8.1
.
Hint
During installation conda will try to verify what is the maximum version of CUDA installed graphics cards can support and it will install non-CUDA enabled versions by default if none are found (as is the case on the login node, where environments are normally built). This can be usually overcome by setting explicitly that the packages should be the CUDA-enabled ones. It might however happen, that the environment creation process aborts with a message similar to:
nothing provides __cuda needed by tensorflow-2.9.1-cuda112py310he87a039_0
In this instance it might be necessary to override the CUDA settings used by
conda/mamba.
To do this, prefix your environment creation command with CONDA_OVERRIDE_CUDA=CUDAVERSION
,
where CUDAVERSION is the CUDA toolkit version you intend to use as in:
CONDA_OVERRIDE_CUDA="11.2" mamba env create -f cuda-env.yml
This will allow conda to assume that the respective CUDA libraries will be present at a later point and so it will skip those requirements during installation.
For more information, see this helpful post in Conda-Forge’s documentation.
Examples:
Simple Tensorflow/Keras model
Let’s run the MNIST example from Tensorflow’s tutorials:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
The full code for the example is in
tensorflow_mnist.py
.
One can run this example with srun
:
$ wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/tensorflow/tensorflow_mnist.py
$ module load anaconda
$ srun --time=00:15:00 --gres=gpu:1 python tensorflow_mnist.py
or with sbatch
by submitting
tensorflow_mnist.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00
module load anaconda
python tensorflow_mnist.py
Do note that by default Keras downloads datasets to $HOME/.keras/datasets
.
Running simple Tensorflow/Keras model with NVIDIA’s containers
Let’s run the MNIST example from Tensorflow’s tutorials:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
The full code for the example is in
tensorflow_mnist.py
.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/scicomp-docs/master/triton/examples/tensorflow/tensorflow_mnist.py
module load nvidia-tensorflow/20.02-tf1-py3
srun --time=00:15:00 --gres=gpu:1 singularity_wrapper exec python tensorflow_mnist.py
or with sbatch
by submitting
tensorflow_singularity_mnist.sh
:
#!/bin/bash
#SBATCH --gres=gpu:1
#SBATCH --time=00:15:00
module load nvidia-tensorflow/20.02-tf1-py3
singularity_wrapper exec python tensorflow_mnist.py
Do note that by default Keras downloads datasets to $HOME/.keras/datasets
.
Theano
- supportlevel:
- pagelastupdated:
- maintainer:
If you’re using the theano library, you need to tell theano to store
compiled code on the local disk on the compute node. Create a file
~/.theanorc
with the contents
[global]
base_compiledir=/tmp/%(user)s/theano
Also make sure that in your batch job script you create this directory before you launch theano. E.g.
mkdir -p /tmp/${USER}/theano
The problem is that by default the base_compiledir
is in your home
directory (~/.theano/
), and then if you first happen to run a job on a
newer processor, a later job that happens to run on an older processor
will crash with an “Illegal instruction” error.
VASP
VASP (Vienna Ab initio Simulation Package) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles.
VASP is licensed software, requiring the licensee to keep the vasp team updated with a list of user names. Thus, in order to use VASP arrange with the “vaspmaster” for your group to be put on the vasp licensed user list. Afterwards, contact your local triton admin who will take care of the IT gymnastics, and CC the vaspmaster so that he is aware of who gets added to the list.
For the PHYS department, the vaspmaster is Ivan Tervanto.
For each VASP version, there are 3 binaries compiled. All versions are MPI and OpenMP versions.
vasp_std: The “standard” vasp, compiled with NGZhalf
vasp_gam: Gamma point only. Faster if you use only a single k-point.
vasp_ncl: For non-collinear spin calculations
VASP 6.4.1
The binaries are compiled with the GNU compilers, MKL (incl ScaLAPACK) and OpenMPI
libraries, the used modules gcc/11.2.0 intel-oneapi-mkl/2021.4.0 openmpi/4.0.5
.
Example batch script
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=06:00:00
#SBATCH --mem-per-cpu=1500M
module load vasp/6.4.1
srun vasp_std
Potentials
Potentials are stored at /share/apps/vasp/pot
.
Old VASP versions (obsolete, for reference only!)
These old versions are unlikely to work as they use old MPI and IB libraries that have stopped working due to upgrades over the years.
VASP 5.4.4
The binaries are compiled with the Intel compiler suite and the MKL
library, the used toolchain module is intel-parallel-studio/cluster.2020.0-intelmpi
.
Example batch script
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=06:00:00
#SBATCH --mem-per-cpu=1500M
module load vasp/5.4.4
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
srun vasp_std
VASP 5.4.1
Currently the binaries are compiled with GFortran instead of Intel Fortran (the Intel Fortran binaries crashed, don’t know why yet). Example batch script
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --time=06:00:00
#SBATCH --mem-per-cpu=1500M
module load vasp/5.4.1-gmvolf-triton-2016a
srun vasp_std
For each VASP version, there are two binaries compiled with slightly different options:
vasp.mpi.NGZhalf
vasp.mpi
Both are MPI versions. The first one is what you should normally use; it is compiled with the NGZhalf option which reduces charge density in the Z direction, leading to less memory usage and faster computation. The second version is needed for non-collinear spin calculations. The binaries can be found in the directory /share/apps/vasp/$VERSION/ . For those of you who need to compile your own version of VASP, the makefiles used for these builds can be used as a starting point, and are found in the directory /share/apps/vasp/makefiles .
VASP 5.3.5
The binaries are optimized for the Xeon Ivy Bridge nodes, although they will also work fine on the older Xeon Westmere and Opteron nodes. Note that for the moment only the NGZhalf version has been built. If you need the non-NGZhalf version for non-collinear spin calculations please contact triton support. Example job script below:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=12
#SBATCH --time=06:00:00
#SBATCH --mem-per-cpu=2500M
module load vasp/5.3.5
srun vasp.mpi.NGZhalf
The relative time to run the vasptest v2 testsuite on 12 cores (so a full node for Xeon Westmere and Opteron nodes, and 12/20 cores on a Xeon Ivy Bridge node) is for Xeon IB/Xeon Westmere/Opteron 1.0/2.0/2.8. So one sees that the Xeon Ivy Bridge nodes are quite a lot faster per core than the older nodes (with the caveat that the timings may vary depending on other jobs that may have been running on the Xeon IB node during the benchmark).
VASP 5.3.3
The binaries are optimized for the Xeon nodes, although they also work on the Opteron nodes. Some simple benchmarks suggest that the Opteron nodes are a factor of 1.5 slower than the Xeon nodes, although it is recommended to write the batch script such that Opteron nodes can also be used, as the Opteron queue is often shorter. An example script below:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=12
#SBATCH --time=06:00:00
#SBATCH --mem-per-cpu=2500M
module load vasp/5.3.3
srun vasp.mpi.NGZhalf
VASP 5.3.2 and older
The binaries are optimized for the Intel Xeon architecture nodes, and are not expected to work on the Opteron nodes. An example job script is below (Note that it is different from the script for version 5.3.3 and newer above!):
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=12
#SBATCH --time=1-00:00:00
#SBATCH --mem-per-cpu=3500M
module load vasp/5.3.2
srun vasp.mpi.NGZhalf
Potentials
PAW potentials for VASP can be found in the directory /share/apps/vasp/pot. The recommended potentials are the ones in the Apr2012.52 subdirectory. For reference, an older set of potentials dating back to 2003 can be found in the “2003” subdirectory.
Validation
The vasp.mpi.NGZhalf builds have been verified to pass all the tests in the vasptest suite.
Other
Old makefiles
Here is a number of Makefiles copy-pasted from old Rocks installation. Can be useful in general, though may require adaptation to new installation. Please, send us a fully working copy if you have one.
See old wiki: https://wiki.aalto.fi/display/Triton/Applications
Rename vasp.x.y.makefile => vasp.x.y/makefile
VisIT
This uses Singularity containers, so you should refer to that page first for general information.
Visit has been compiled using the build_visit-script from the VisIT page on an Ubuntu image. It has minimal amount of other software installed.
Parallelization is done against Triton’s OpenMPI, so using this container with other OpenMPI modules is discouraged.
Within the container VisIT is installed under /opt/visit/
. PATH is
automatically appended with their respective paths so all program calls are
available automatically.
Usage
This example shows how you can launch visit on the login node for small visualizations or launch it in multiprocess state on a reserved node. Firstly, let’s load the module:
module use /share/apps2/singularity/modules
module load Visit
Now you can run VisIT with:
singularity_wrapper exec visit
If you want to run VisIT with multiple CPUs, you should reserve a node with
sinteractive
:
sinteractive --time=00:30:00 --ntasks=2 --nodes=1-1
singularity_wrapper exec visit -np 2
Do note the flag --nodes=1-1
that ensures that all of VisITs processes end up
on the same node. Currently VisIT encounters problems when going across the
node lines.
VSCode on Triton
VSCode is a text editor and integrated development environment. It is very popular these days, partly due to it’s good usability.
Installation
If you are using on Triton, it’s available as a web app through Open OnDemand, see below.
It can also be installed on your own computer, which might be good to do anyway. If you do this, make sure you turn off telemetry if you don’t want Microsoft to get reports of your activity. Search “telemetry” in settings to check and disable (note that this doesn’t fully turn it off).
VSCodium is an open-source build of VScode (like “chromium” is an open-source build of Google Chrome) that disables all telemetry by default and removes non-open source bits. It is essentially the same thing, but due to Microsoft licenses it can’t use the same extension registry as VSCode. It does have a stand-alone extension registry, though.
Security and extensions
As always when using user-contributed extensions, be cautious of what extensions you install. A malicious extension can access and/or delete all of the data available via your account.
VSCode through Open OnDemand
See also
VSCode is available through Open OnDemand, and with this you can select whatever resources you want (memory, CPU, etc) and run directly in the Slurm queue. This means you can directly perform calculations in that VSCode session and it runs properly (not on the login node).
This is useful for getting things done quickly, but running in a web browser can be limited in some cases (interface, lifetime, etc.).
VSCode remote SSH
“Remote SSH” is a nice way to work on a remote computer and provides both editing and shell access, but everything will run directly on the login node on Triton. This is OK for editing, but not for main computations (see the section above or below). To repeat: don’t use this for running big computations.

If you see this in the lower left corner (or whatever the name of your cluster SSH config is), you are connected to the login node (and should not do big calculations). It’s possible the exact look may be different for others.
You can see connection instructions (including screenshots) at the Sigma2 instructions.
VSCode can use a regular OpenSSH configuration file, so you may as well set that up once and it can be used for everything - see SSH for the full story. The basics of SSH to Triton are in Connecting via ssh. A SSH key can allow you to connect without entering a password every time.
VSCode remote SSH host directly to interactive job
Sometimes you want more resources than the login node. This section presents a way to have VSCode directly connect to a job resource allocation on Triton - so you can do larger calculations / use more memory / etc. without interfering with others. Note that for real production calculations, you should use Serial Jobs, and not run stuff through your editor, since everything gets lost when your connection dies.
This section contains original research and may not fully work, and may only work on Linux/Mac right now (but Windows might work too since it uses OpenSSH).
In you ~/.ssh/config
, add this block to define a server
triton-vscode
. For more information .ssh/config
, including
what these mean and what else you might need in here, see
SSH:
Host triton-vscode
ProxyCommand ssh triton /share/apps/ssh-node-proxycommand --partition=interactive --time=1:00:00
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
User USERNAME
# You also need a triton alias here:
Host triton
HostName triton.aalto.fi
# ... any other thing you need for connecting to triton.
User USERNAME
Now, with VSCode’s Remote SSH, you can select the triton-vscode
remote. It will ssh to Triton, request a job, and then directly
connect to the job. Configure the job requirements in the
ProxyCommand line (see Job submission - you can have
multiple Host
sections for different types of requirements).
Possible issues which may affect usage:
If the ssh connection dies, the background job will be terminated. You will lose your state and not be able to save.
If the job dies due to time or memory exceeded, the same as above will happen: your job will die and there is no time to save.
If you
srun
from within the job, then it gets messed up because the environment variableSLURM_JOB_ID
is set from the interactive job that got started. It’s hard for us to unset this, so if you are using the terminal tosrun
orsbatch
, you shouldunset SLURM_JOB_ID
first. (Note there are many other variables set by Slurm. Make sure that they don’t interfere with jobs you may run from this vscode session).If you request a GPU node or other high resources, this is reserved the whole time even if you aren’t using them. Consider this before reserving large resources (unless you close the jobs soon), or you might get an email from us asking if we can help you improve research usage.
Whisper
This uses Singularity containers, so you should refer to that page first for general information.
There are two variants of Whisper available. The “standard” Whisper uses whisper-ctranslate2, which is a CLI for faster-whisper, a reimplementation of OpenAI’s Whisper using Ctranslate2. Original repository for this project can be found here.
The second variant is whisper-diarization, which is a fork of faster-whisper with support for speaker detection (diarization). Original repository for this project can be found here.
Of these two, whisper-diarization runs noticable slower and has less versatile options. Using base Whisper is recommended if speaker detection is not necessary.
Usage (Whisper)
This example shows you a sample script to run Whisper.
$ module load whisper
$ srun --mem=4G singularity_wrapper run your_audio_file.wav --model_directory $medium_en --local_files_only True --language en
Option --model_directory $medium_en
Tells whisper to use a local model, in
this case the model medium.en
with the path to the model given through
the environment variable $medium_en
. For list of all local models, you can
run echo $model_names
as long as the module is loaded. (These models are pre-downloaded by us and the variables
are defined when the module is loaded.)
You can also give it
a path to your own model if you have one. The other imporant option here is
--local_file_only True
. This stops Whisper from checking
if there are newer versions of the model online. The option --language LANG
is not necessary, but whisper’s language detection is sometimes weird.
If you are transcribing language different
from English, use a general model e.g. $medium
. If your source
audio is in English, using English-specific models is usually a
performance gain.
For full list of options, run:
$ singularity_wrapper run --help
Notes on general Slurm resources:
For memory, requesting roughly 4G for medium model or smaller, and 8G for large should be sufficient.
When running on CPU, requesting additional CPUs should give a performance increase until 8 CPUS. Whisper doesn’t scale properly beyond 8 CPUS, and will actually run slower in most cases.
Running on GPU
Singularity-wrapper takes care of making GPUs available for the container,
so all you need to do to run Whisper on a GPU is use the previous
command and add additional flag: --device cuda
.
Without this, Whisper will only run on a CPU even if a GPU is available. Remember to request a GPU in the Slurm job.
Usage (Whisper-diarization)
This example shows you a sample script to run whisper-diarization.
$ module load whisper-diarization
$ srun --mem=6G singularity_wrapper run -a your_audio_file.wav --whisper-model $medium_en
Option --whisper-model $medium_en
Tells whisper which model to use, in this case
medium.en
. If you use environment variables that come with the module to specify the
model, whisper will run using a local model. Otherwise it will download the model to
your home directory. For list of all local models, run echo $model_names
with
whisper-diarization loaded.
Note that syntax is unfortunately somewhat different compared to plain whisper. You
need to specify the audio file to use with the argument -a audio_file.wav
and
similarily the syntax to specificy the model is different.
For full list of options, run:
$ singularity_wrapper run --help
Notes on general Slurm resources:
Whisper-diarization requires slightly more memory than plain Whisper. Requesting roughly 6G for medium model or smaller, and 12G for large should be sufficient.
When running on CPU, requesting additional CPUs should give a performance increase until 8 CPUS. Whisper doesn’t scale properly beyond 8 CPUS, and will actually run slower in most cases.
Running on GPU
Compared to plain Whisper, running whisper-diarization on GPU takes little
more work. Singularity-wrapper still takes care of making GPUs available
for the container and you still specify you want to use GPU using the flag
--device cuda
.
Unfortunately whisper-diarization requires multiple models when using a GPU , and there isn’t a practical way to use local models for this. For this reason, you should create a symlink from whisper’s cache folder in your home, to your work directory. This way you avoid filling your home directory’s quota.
To do this, run following commands:
$ mkdir -p ~/.cache/huggingface/ ~/.cache/torch/NeMo temp_cache/huggingface/ temp_cache/NeMo/ $WRKDIR/whisper_cache/huggingface $WRKDIR/whisper_cache/NeMo
$ mv ~/.cache/huggingface/* temp_cache/huggingface/
$ mv ~/.cache/torch/NeMo/* temp_cache/NeMo/
$ rmdir ~/.cache/huggingface/ ~/.cache/torch/NeMo
$ ln -s $WRKDIR/whisper_cache/huggingface ~/.cache/
$ ln -s $WRKDIR/whisper_cache/NeMo ~/.cache/torch/
$ mv temp_cache/huggingface/* ~/.cache/huggingface/
$ mv temp_cache/NeMo/* ~/.cache/torch/NeMo
$ rmdir temp_cache/huggingface temp_cache/NeMo temp_cache
This bunch of commands first creates cache folders if they don’t exist and moves any existing files to temp directory, Next it creates symlinks to your work directory in place of original cache directories, and moves all previous files back. This way all downloaded files exist on your work instead of eating your home quota.
Converting audio files
Whisper should automatically convert your audio file to a correct
format when you run it. In the case this does not work, you
can convert it on Triton using ffmpeg
with following commands:
$ module load ffmpeg
$ ffmpeg -i input_file.audio output.wav
If you want to extract audio from a video, you can instead do:
$ module load ffmpeg
$ ffmpeg -i input_file.video -map 0:a output.wav
Examples
Examples
Master-Worker Example
Following example shows how to manage host list using the python-hostlist package and run different tasks for master task and worker task.
This kind of structure might be needed if one wants to create a e.g. Spark cluster or use some other program that uses master-worker-paradigm, but does not use MPI.
It is important to make sure that in case of job cancellation all programs started by the scripts will be killed gracefully. In case of Spark or other programs that initialize a cluster using SSH and then forking a process, these forked processes must be killed after job allocation has ended.
hostlist-test.sh
:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --nodes=3
#SBATCH --ntasks=5
#SBATCH -o hostlist-test.out
# An example of a clean_up-routine if the master has to take e.g. ssh connection to start program on workers
function clean_up {
echo "Got SIGTERM, will clean up my workers and exit."
exit
}
trap clean_up SIGHUP SIGINT SIGTERM
# Actual script that defines what each worker will do
srun bash run.sh
run.sh
:
#!/bin/bash
# Get a list of hosts using python-hostlist
nodes=`hostlist --expand $SLURM_NODELIST|xargs`
# Determine current worker name
me=$(hostname)
# Determine master process (first node, id 0)
master=$(echo $nodes | cut -f 1 -d ' ')
# SLURM_LOCALID contains task id for the local node
localid=$SLURM_LOCALID
if [[ "$me" == "$master" && "$localid" -eq 0 ]]
then
# Run these if the process is the master task
echo "I'm the master with number "$localid" in node "${me}". My subordinates are "$nodes
else
# Run these if the process is a worker
echo "I'm a worker number "$localid" in node "${me}
fi
Example output:
I'm a worker number 1 in node opt469
I'm a worker number 2 in node opt469
I'm the master with number 0 in node opt469. My subordinates are opt469 opt470 opt471
I'm a worker number 0 in node opt471
I'm a worker number 0 in node opt470
Python OpenMP example
parallel_Python.sh
:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH -o parallel_Python.out
module load anaconda/2022-01
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun -c $SLURM_CPUS_PER_TASK python parallel_Python.py
parallel\Python.py
:
import numpy as np
a = np.random.random([2000,2000])
a = a + a.T
b = np.linalg.pinv(a)
print(np.amax(np.dot(a,b)))
Serial R example
#!/bin/bash -l
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --mem=100M
#SBATCH --output=r_serial.out
module load r
n=3
m=2
srun Rscript --vanilla r_serial.R $n $m
args = commandArgs(trailingOnly=TRUE)
n<-as.numeric(args[1])
m<-as.numeric(args[2])
print(n)
print(m)
A<-t(matrix(0:5,ncol=n,nrow=m))
print(A)
B<-t(matrix(2:7,ncol=n,nrow=m))
print(B)
C<-matrix(0.5,ncol=n,nrow=n)
print(C)
C<-A %*% t(B) + 2*C
print(C)
Parallel R example
#!/bin/bash
#SBATCH --time=00:20:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_parallel.out
# Set the number of OpenMP-threads to 1,
# as we're using parallel for parallelization
export OMP_NUM_THREADS=1
# Load the version of R you want to use
module load r
# Run your R script
srun Rscript r_parallel.R
library(pracma)
library(parallel)
invertRandom <- function(index) {
A<-matrix(runif(2000*2000),ncol=2000,nrow=2000);
A<-A + t(A);
B<-pinv(A);
return(max(B %*% A));
}
ptm<-proc.time()
mclapply(1:16,invertRandom, mc.cores=Sys.getenv('SLURM_CPUS_PER_TASK'))
proc.time()-ptm
When constrained to opt-architecture, run times for different core numbers were
ncores |
1 |
2 |
4 |
8 |
---|---|---|---|---|
runtime |
380.757 |
182.185 |
125.526 |
84.230 |
R OpenMP Example
#!/bin/bash
#SBATCH --time=00:15:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=2G
#SBATCH --output=r_openmp.out
module load r
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
time srun Rscript --default-packages=methods,utils,stats R-benchmark-25.R
The benchmark script is available here (more information about it is available here page).
Python
IPython parallel
A example batch script that uses IPython parallel (ipyparallel
)
within slurm. See also the interactive hints on the Python page.
ipyparallel uses global state in your home directory, so you can
only run _one_ of these at a time! You can add the --profile=
option to name different scripts (you could use $SLURM_JOB_ID
).
But then you will get a growing number of unneeded profile directories
at ~/.ipython/profile_*
, so this isn’t recommended. Basically,
ipyparallel is more designed for one-at-a-time interactive use rather
than batch scripting (unless you do more work…).
ipyparallel.sh
is an example
slurm script that sets up ipyparallel. It assumes that most work is
done in the engines. It has inline Python, replace this with python
your_script_name.py
#!/bin/bash
#SBATCH --nodes=4
module load anaconda
set -x
ipcontroller --ip="*" &
sleep 5
# Run the engines in slurm job steps (makes four of them, since we use
# the --nodes=4 slurm option)...
srun ipengine --location=$(hostname -f) &
sleep 5
# Put the actual Python isn't in a job step. This is assuming that
# most work happens in engines
python3 <<EOF
import os
import ipyparallel
client = ipyparallel.Client()
result = client[:].apply_async(os.getpid)
pid_map = result.get_dict()
print(pid_map)
EOF
Python MPI4py
A simple script mpi4py.py
that utilizes mpi4py.
#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sys.stdout.write(
"Hello, World! I am process %d of %d on %s.\n"
% (rank, size, name))
Running mpi4py.py using only srun:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4
module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py
Example sbatch script mpi4py.sh
when running mpi4py.py through
sbatch:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=4
module load Python/2.7.11-goolf-triton-2016b
mpiexec -n $SLURM_NTASKS python mpi4py.py
Running Python with OpenMP parallelization
Various Python packages such as Numpy, Scipy and pandas can utilize OpenMP
to run on multiple CPUs. As an example, let’s run the python script
python_openmp.py
that calculates multiplicative inverse of five symmetric matrices of
size 2000x2000.
nrounds = 5
t_start = time()
for i in range(nrounds):
a = np.random.random([2000,2000])
a = a + a.T
b = np.linalg.pinv(a)
t_delta = time() - t_start
print('Seconds taken to invert %d symmetric 2000x2000 matrices: %f' % (nrounds, t_delta))
The full code for the example is in
HPC examples-repository.
One can run this example with srun
:
wget https://raw.githubusercontent.com/AaltoSciComp/hpc-examples/master/python/python_openmp/python_openmp.py
module load anaconda/2022-01
export OMP_PROC_BIND=true
srun --cpus-per-task=2 --mem=2G --time=00:15:00 python python_openmp.py
or with sbatch
by submitting
python_openmp.sh
:
#!/bin/bash -l
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=1G
#SBATCH -o python_openmp.out
module load anaconda/2022-01
export OMP_PROC_BIND=true
echo 'Running on: '$HOSTNAME
srun python python_openmp.py
Important
Python has a global interpreter lock (GIL), which forces some operations to be executed on only one thread and when these operations are occuring, other threads will be idle. These kinds of operations include reading files and doing print statements. Thus one should be extra careful with multithreaded code as it is easy to create seemingly parallel code that does not actually utilize multiple CPUs.
There are ways to minimize effects of GIL on your Python code and if you’re creating your own multithreaded code, we recommend that you take this into account.
Detailed instructions
Debugging
Note
Also see Profiling.
Debugging is one of the most fundamental things you can do while using software: debuggers allow you to see inside of running programs, and this is a requirement of developing with any software. Any reasonable programming language will have a debugger made as one of the first tasks when it is being created.
Serial code debugging
GDB is the usual GNU debugger.
Note: the latest version of gcc/gfortran available through module
require -gdwarf-2
option along with the -g
to get it to work with the
default gdb command. Otherwise the default version 4.4 should work
normally with just -g
.
Valgrind is another tool that helps you to debug and profile your serial code on Triton.
MPI debugging & profiling
GDB with the MPI code
Compile your MPI app with -g, run GDB for every single MPI rank with:
salloc -p play --nodes 1 --ntasks 4 srun xterm -e gdb mpi_app
You should get 4 xterm windows to follow, from now on you have full control of you MPI app with the serial debugger.
PADB
A Parallel Debugging Tool. Works on top of SLURM, support OpenMPI or MPICH only (as of June 2015), that is MVAPICH2 is not supported. Do not require code re-compilation, just run your MPI code normally, and then launch padb separately to analyze the code behavior.
Usage summary (for full list and explanations please consult http://padb.pittman.org.uk/):
# assume you have your openmpi module loaded already
module load padb
padb --create-secret-file # for the very first time only
# Show all your current active jobs in the SLURM queue
padb -show-jobs
# Target a specific jobid, and reports its process state
padb --proc-summary
# or, for all running jobs
padb --all --proc-summary
# Target a specific jobid, and report its MPI message queue, stack traceback, etc.
padb --full-report=
# Target a specific jobid, and report its stack trace for a given MPI process (rank)
padb --stack-trace --tree --rank
# Target a specific jobid, and report its stack trace including information about parameters and local variables for a given MPI process (rank)
padb --stack-trace --tree --rank -Ostack-shows-locals=1 -Ostack-
shows-params=1
# Target a specific jobid, and reports its MPI message queues
padb --mpi-queue
# Target a specific jobid, and report its MPI process progress (queries in loop over and over again)
padb --mpi-watch --watch -Owatch-clears-screen=no
Storage: local drives
See also
Local disks on computing nodes are the preferred place for doing your IO. The general idea is use network storage as a backend and local disk for actual data processing.
In the beginning of the job cd to
/tmp
and make a unique directory for your runcopy needed input from WRKDIR to there
run your calculation normally forwarding all the output to
/tmp
in the end copy relevant output to WRKDIR for analysis and further usage
Pros
You get better and steadier IO performance. WRKDIR is shared over all users making per-user performance actually rather poor.
You save performance for WRKDIR to those who cannot use local disks.
You get much better performance when using many small files (Lustre works poorly here).
Saves your quota if your code generate lots of data but finally you need only part of it
In general, it is an excellent choice for single-node runs (that is all job’s task run on the same node).
Cons
Not feasible for huge files (>100GB). Use WRKDIR instead.
Small learning curve (must copy files before and after the job).
Not feasible for cross-node IO (MPI jobs). Use WRKDIR instead.
How to use local drives on compute nodes
NOT for the long-term data. Cleaned every time your job is finished.
You have to use --gres=spindle
to ensure that you get a hard
disk (note 2019-january: except GPU nodes).
/tmp
is a bind-mounted user specific directory. Directory is per-user
(not per-job that is), if you get two jobs running on the same node, you
get the same /tmp
.
Interactively
How to use /tmp when you login interactively
$ sinteractive --time=1:00:00 # request a node for one hour
(node)$ mkdir /tmp/$SLURM_JOB_ID # create a unique directory, here we use
(node)$ cd /tmp/$SLURM_JOB_ID
... do what you wanted ...
(node)$ cp your_files $WRKDIR/my/valuable/data # copy what you need
(node)$ cd; rm -rf /tmp/$SLURM_JOB_ID # clean up after yourself
(node)$ exit
In batch script
Batch job example that prevents data lost in case program gets
terminated (either because of scancel
or due to time limit).
#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=2500M # time and memory requirements
mkdir /tmp/$SLURM_JOB_ID # get a directory where you will send all output from your program
cd /tmp/$SLURM_JOB_ID
## set the trap: when killed or exits abnormally you get the
## output copied to $WRKDIR/$SLURM_JOB_ID anyway
trap "mkdir $WRKDIR/$SLURM_JOB_ID; mv -f /tmp/$SLURM_JOB_ID $WRKDIR/$SLURM_JOB_ID; exit" TERM EXIT
## run the program and redirect all IO to a local drive
## assuming that you have your program and input at $WRKDIR
srun $WRKDIR/my_program $WRKDIR/input > output
mv /tmp/$SLURM_JOB_ID/output $WRKDIR/SOMEDIR # move your output fully or partially
Batch script for thousands input/output files
If your job requires a large amount of files as input/output using tar
utility can greatly reduce the load on the $WRKDIR
-filesystem.
Using methods like this is recommended if you’re working with thousands of files.
Working with tar balls is done in a following fashion:
Determine if your input data can be collected into analysis-sized chunks that can be (if possible) re-used
Make a tar ball out of the input data (
tar cf <tar filename>.tar <input files>
)At the beginning of job copy the tar ball into
/tmp
and untar it there (tar xf <tar filename>.tar
)Do the analysis here, in the local disk
If output is a large amount of files, tar them and copy them out. Otherwise write output to
$WRKDIR
A sample code is below:
#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --mem-per-cpu=2000M # time and memory requirements
mkdir /tmp/$SLURM_JOB_ID # get a directory where you will put your data
cp $WRKDIR/input.tar /tmp/$SLURM_JOB_ID # copy tarred input files
cd /tmp/$SLURM_JOB_ID
trap "rm -rf /tmp/$SLURM_JOB_ID; exit" TERM EXIT # set the trap: when killed or exits abnormally you clean up your stuff
tar xf input.tar # untar the files
srun input/* # do the analysis, or what ever else
tar cf output.tar output/* # tar output
mv output.tar $WRKDIR/SOMEDIR # copy results back
Storage: Lustre (scratch)
See also
Lustre is scalable high performance file system created for HPC. It allows MPI-IO but mainly it provides large storage capacity and high sequential throughput for cluster applications. Currently the total capacity is 2PB. The basic idea in Lustre is to spread data in each file over multiple storage servers. With large (larger than 1GB) files Lustre will significantly boost the performance.
Working with small files
As Lustre is meant for large files, the performance with small (smaller than 10MB) files will not be optimal. If possible, try to avoid working with multiple small files.
Note: Triton has a default stripe of 1 already, so it is by default optimized for small files (but it’s still not that great). If you use large files, see below.
If small files are needed (i.e. source codes) you can tell Lustre not to spread data over all the nodes. This will help in performance.
To see the striping for any given file or directory you can use following command to check status
lfs getstripe -d /scratch/path/to/dir
You can not change the striping of an existing file, but you can change the striping of new files created in a directory, then copy the file to a new name in that directory.
lfs setstripe -c 1 /scratch/path/to/dir
cp somefile /scratch/path/to/dir/newfile
Working with lots of small files
Large datasets which consist mostly of small (<1MB) files can be slow to
process because of network overhead associated with individual files. If
it is your case, please consult Compute node local
drives page, see the tar
example
over there or find some other way to compact your files together into
one.
Working with large files
By default Lustre on Triton is configured to stripe a single file over a single OST. This provides the best performance for small files, serial programs, parallel programs where only one process is doing I/O, and parallel programs using a file-per-process file I/O pattern. However, when working with large files (>> 10 GB), particularly if they are accessed in parallel from multiple processes in a parallel application, it can be advantageous to stripe over several OST’s. In this case the easiest way is to create a directory for the large file(s), and set the striping parameters for any files subsequently created in that directory:
cd $WRKDIR
mkdir large_file
lfs setstripe -c 4 large_file
The above creates a directory large_file
and specifies that files
created inside that directory will be striped over 4 OST’s. For really
really large files (hundreds of GB’s) accessed in parallel from very
large MPI runs, set the stripe count to “-1” which tells the system to
stripe over all the available OST’s.
To reset back to the default settings, run
lfs setstripe -d path/to/directory
Lustre: common recommendations
Minimize use of
ls -l
andls --color
when possible
Several excellent recommendations are at
https://www.nas.nasa.gov/hecc/support/kb/Lustre-Best-Practices_226.html
http://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips.
They are fully applicable to our case.
Be aware, that being a high performance filesystem Lustre still has its
own bottlenecks, and even non-proper a usage by a single user can get
whole system in stuck. See the recommendations at the link above how to
avoid those potential situations. Common Lustre troublemakers are
ls -lR
, creating many small files, rm -rf
, small random i/o,
heavy bulk i/o.
For advanced user, these slides can be interesting: https://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf
Open OnDemand
Warning
Triton OOD is under development and is available as a preview/for feedback. It may or may not work at any given time as work on it. It is probably best to use our chat to give quick feedback.
Open OnDemand is a web-based interface to computer clusters. It provides a low-threshold way to do easy work and shell access to do more. It complements, not replaces, the traditional ssh access: just like with Jupyter, it may help you get started, but most people will eventually move towards shell access (even if that shell is via Open OnDemand).
Connecting
Address: http://ood.triton.aalto.fi . Log in with the usual Aalto login. Connections only from Aalto networks or VPN. A pre-existing Triton account is needed.
How to use
The first view is a dashboard that provides an interface to a number of applications:
Shell: Top bar → Clusters → Triton shell access. Or via the file manager.
Files: Top bar → Files → choose your directory. You can upload and download files this way.
Other applications via the main page or Top bar → Interactive Apps → (choose).
Applications
Once logged in, there are ways to start separate applications, for example Jupyter. These run as separate, independent processes.
We have these applications available and supported:
Jupyter
RStudio
Matlab
Spyder
Code Server
…
Choose partition ‘interactive’ and a correct account.
Current issues
Apps will be adjusted.
Profiling
Note
Also see Debugging.
You have code, you want it to run fast. This is what Triton is for. But how do you know if your code is running as fast as it can? We are scientists, and if things aren’t quantified we can’t do science on them. Programming can often seem like a black box: modern computers are extremely complicated, and people can’t predict what is actually making code fast or slow anymore. Thus, you need to profile your code: get detailed performance measurements. These measurements let you know how to make it run faster.
There are many tools for profiling, and it really is one of the fundamental principles for any programming language. You really should learn how to do quick profile just to make sure things are OK, even if you aren’t trying to optimize things: you might find a quick win even if you didn’t write the code yourself (for example, 90% of your time is spent on input/output).
This page is under development, but so far serves as an introduction. We hope to expand it with specific Triton examples.
Summary: profiling on Linux
First off, look at your language-specific profiling tools.
Generic Linux profiling tools (big and comprehensive list, also some presentations):http://www.brendangregg.com/linuxperf.html
Profiling in C and Python (introduction + examples): http://rkd.zgib.net/scicomp/profiling/profiling.html
CPU profiling
This can give you a list of where all your processor time is going, either per-function or per-line. Generally, most of your time is in a very small region of your code, and you need to know what this is in order to improve just that part.
See the C and Python profiling example above.
GNU gprof
gprof is a profiler based on instrumenting your code (build with -pg). It has relatively high overhead, but gives exact information e.g. for the number of times a function is called.
Perf
perf is a sampling profiler, which periodically samples events originating e.g. from the CPU performance monitoring unit (PMU). This generates a statistical profile, but the advantage is that the overhead is very low (single digit %), and one can get timings at the level of individual asm instructions. For a simple example, consider a (naive) matrix multiplication program:
Compile the program (-g provides debug symbols which will be useful later on, at no performance cost):
$ gfortran -Wall -g -O3 mymatmul.f90
Run the program via the profiler to generate profile data:
$ perf record ./a.out
Now we can look at the profile:
$ perf report
# Samples: 1251
#
# Overhead Command Shared Object Symbol
# ........ .............. ............................. ......
#
85.45% a.out ./a.out [.] MAIN\_\_
4.24% a.out /usr/lib/libgfortran.so.3.0.0 [.] \_gfortran\_arandom\_r4
3.12% a.out /usr/lib/libgfortran.so.3.0.0 [.] kiss\_random\_kernel
So 85% of the runtime is spent in the main program (symbol MAIN__), and most of the rest is in the random number generator, which the program calls in order to generate the input matrices.
Now, lets take a closer look at the main program:
$ perf annotate MAIN__
------------------------------------------------
Percent \| Source code & Disassembly of a.out
------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000004008b0 <MAIN__>:
…
: c = 0.
:
: do j = 1, n
: do k = 1, n
: do i = 1, n
: c(i,j) = c(i,j) + a(i,k) \* b(k,j)
30.12 : 400a40: 0f 28 04 01 movaps (%rcx,%rax,1),%xmm0
4.92 : 400a44: 0f 59 c1 mulps %xmm1,%xmm0
12.36 : 400a47: 0f 58 04 02 addps (%rdx,%rax,1),%xmm0
40.73 : 400a4b: 0f 29 04 02 movaps %xmm0,(%rdx,%rax,1)
9.65 : 400a4f: 48 83 c0 10 add $0x10,%rax
Unsurprisingly, the inner loop kernel takes up practically all the time.
For more information on using perf, see the perf tutorial at
Input/output profiling
This will tell you how much time is spent reading and writing data, where, and what type of patterns it has (big reads, random access, etc). Note that you can see the time information when CPU profiling: if input/output functions take a lot of time, you need to improve IO.
/usr/bin/time -v
prints some useful info about IO operations and
statistics.
Lowest level: use strace to print the time taken in every system call that accesses files. This is not that great.:
# Use strace to print the total bytes
strace -e trace=desc $command |& egrep 'write' | awk --field-separator='=' '{ x+=$NF } END { print x }'
strace -e trace=desc $command |& egrep 'read' | awk --field-separator='=' '{ x+=$NF } END { print x }'
# Number of calls only
strace -e trace=file -c $command
Memory profiling
Less common, but it can tell you something about what memory is being used.
If you are making your own algorithms, memory profiling becomes more important because you need to be sure that you are using the memory hierarchy efficiently. There are tools for this.
MPI and parallel profiling
mpiP
mpiP: Lightweight, Scalable MPI Profiling http://mpip.sourceforge.net/. Collects statistical information about MPI functions. mpiP is a link-time library, that means that it can be linked to the object file, though it is recommended that you have recompiled the code with -g. Debugging information is used to decode the program counters to a source code filename and line number automatically. mpiP will work without -g, but mileage may vary.
Usage example:
# assume you have you MPI flavor module loaded
module load mpip/3.4.1
# link or compile your code from scratch with -g
mpif90 -g -o my_app my_app.f90 -lmpiP -lm -lbfd -liberty -lunwind
# or
mpif90 -o my_app my_app.o -lmpiP -lm -lbfd -liberty -lunwind
# run the code normally (either interactively with salloc or as usual with sbatch)
salloc -p play --ntasks=4 srun mpi_app
If everything works, you will see the mpiP header preceding your program stdout, and there will be generated a text report file in your work directory. File is small, no worries about quota. Please, consult the link above for the file content explanation. During runtime, one can set MPIP environment variables to change the profiler behavior. Example:
export MPIP="-t 10.0 -k 2"
Scalasca
Available through module load scalasca
How big is my program?
Abstract
You can use your workstation / laptop as a base measuring stick: If the code runs on your machine, as a first guess you can reserve the same amount of CPUs & RAM as your machine has.
Similarly for running time: if you have run it on your machine, you should reserve similar time in the cluster.
Natural unit of program size in Triton is 1 CPU & 4 GB of RAM. If your program needs a lot of RAM, but does not utilize the CPUs, you should try to optimize it.
If your program does the same thing more than once, you can estimate that the total run time is \(T \approx n_{\textrm{steps}} \cdot t_{\textrm{step}}\), where \(t_{\textrm{step}}\) is the time taken by each step.
Likewise, if your program runs multiple parameters, the total time needed is \(T_{\textrm{total}} \approx n_{\textrm{parameters}} \cdot T_{\textrm{single}}\), where \(T_{\textrm{single}}\) is time needed to run the program with some parameters.
You can also run a smaller version of the problem and try to estimate how the program will scale when you make the problem bigger.
You should always monitor jobs to find out what were the actual resources you requested (
seff JOBID
).If you aren’t fully sure of how to scale up, contact us Research Software Engineers early.
Why should you care?
There are many reasons why you should care about this question.
None
A cluster environment is shared among multiple users and thus all users will get their own share of the cluster resources.
The queue system will calculate your fair share of the resources based on the resource requirements you have specified.
This means that if you request more than you need, you will waste resources and you will get less resources in the near future.
None
If, for example, you have a program that takes a day to run a single computation and you have thousands of computations you need to do, you can estimate that you can save a lot of time optimizing the program before starting the computations.
Likewise you can find out that it is not worth the effort to optimize something you will only run once.
You can also find that something is unfeasible with the method you have chosen before you’ve invested a lot of time in implementating it.
None
If, for example, you have a program that you assume should finish in an hour, but it does not finish in an hour, you can infer that either the assumption was incorrect or that the program did not behave as it should.
This can often happen when program is transferred from a desktop environment into the cluster and the program is not aware of this change.
Quite often a program can appear to be running slower than it should be because it is running slower than it should be and something is holding it back. Recognizing that is important.
How do you measure program size?
The program size can be measured in couple of ways:
How many CPUs does my program use?
How much RAM does my program use?
How long does it take to run my program?
How many times do I need to run a program?
These question can seem complicated to answer. Monitoring and profiling is one way of getting concrete numbers, but there are couple of tricks you can use to get a estimate.
How to estimate CPU and RAM usage?
Simple measuring stick: Your own computer
If you know nothing of your program, you can still probaly answer this question:
Does the program run on my own computer?
This can give you a good baseline on how big your program is. In general, you can use the following estimates to approximate your computer:
A typical workstation computer
is about 8 CPUs and 32GB of RAM.
A typical compute node
starts from about 32 CPUs and 128GB of RAM, but they can range up to 128 CPUs and 512GB of RAM.
So if, for example, the program runs on your laptop (), you’ll know
that it should work with a request of 4 CPUs and 16GB.
In general, you can say that:
\(\approx 4 \: \cdot\)
\(\approx 8 \: \cdot\)
or more.
This will give you a good initial measuring scale.
Getting a better CPU / RAM estimate: check your task manager
A simple way of getting a better estimate is to check your computer’s task manager when you are running the program.
In Windows you can open Task manager from the start menu or by pressing CTRL + ALT + DEL.
In Mac OS X you can use finder to launch Activity monitor or press CMD + ALT + ESC.
In Linux you can use System Monitor to see your processes.
When you’re running a program these tools will easily tell you how many CPUs the processes are using and how much memory they are using. CPU usage is a percentage of total CPU capacity. So if your machine has 4 CPUs and you see an usage of 25%, that means your program is using 1 CPU. Similarly, the memory usage is reported as a percentage of the total available memory.
In a cluster environment you can use seff JOBID
for seeing how
much of the reserved CPU and RAM your program used.
For more information, see the monitoring documentation.
Natural unit of scale: 1 CPU = 4GB of RAM
From the previous section we can notice an interesting observation: in HPC clusters, there is usually around 1 CPU for each 4 GB of RAM.
This is not an universal law, but a coincidence that has been true for couple of years due to economic reasons: these numbers give usually the best “bang for the buck”.
In other HPC clusters the ratio might be different, but it is important to know this ratio as that is the ratio that the Slurm queue system uses when it determines the size of a job. It is very easy to calculate: just divide the available RAM with the amount of CPUs.
When determining how big your job is it is useful to round up to the nearest slot:
If your program requires a lot of RAM, but it does not utilize multiple CPUs, it is usually good idea to check whether the RAM usage can be lowered or whether you can utilize multiple CPUs via shared memory parallelism. Otherwise you’re getting billed for resources that you’re not actively using, which lowers your queue priority.
How to estimate execution time?
Simple measuring stick: Your own computer
If you have run the problem on your computer, you’ll want to use that as a measuring stick. First good assumption is that given the same resources, the program should run in the same time in the compute node.
Programs that do iterations
Usually, a program does the same thing more than once. For example:
Physics simulation codes will usually integrate equations in discrete time steps.
Markov chains do the same calculation for each node in the chain.
Deep learning training does training over multiple epochs and the epochs themselves consist of multiple training steps.
Running the same program with different inputs.
If this is the case, it is usually enough to measure the time taken by few iterations and from that information extrapolate the total runtime.
If the time taken by each step is \(t_{\textrm{step}}\), then the total runtime \(T\) is approximately \(T \approx n_{\textrm{steps}} \cdot t_{\textrm{step}}\).
Do note that if you’re planning on running the same calculation multiple times with different parameters and/or datasets you can estimate that the time needed for running it \(T_{\textrm{total}} \approx n_{\textrm{parameters}} \cdot T_{\textrm{single}}\). In these cases array jobs can often be used to split the calculation into multiple jobs.
Programs that run a single calculation
For programs that run a single calculation you can estimate the runtime by solving smaller problems. By running a smaller problem on your own computer and then estimating how much bigger the bigger problem is, you can usually get a good estimate on how much time it takes to solve the bigger problem.
For example, let’s consider a situation where you need to calculate various matrix operations such as multiplications, inversions etc.. Now if a smaller problem uses a matrix of size \(n^{2}\) and bigger problem uses a matrix of size \(m^{2}\), you can calculate that the ratio of the bigger problem to the initial problem is \(r = (m / n)^{2}\).
So if solving the smaller problem takes time \(T_{\textrm{small}}\), then you could estimate that the time taken by the bigger problem is at least \(T_{\textrm{large}} \approx r \cdot T_{\textrm{small}} = (m / n)^{2} \cdot T_{\textrm{small}}\).
This estimate is most likely a bad estimate (most linear algebra algorithms do not scale with \(O(n^{2})\) complexity), but it is a better estimate than no estimate at all.
It is especially important to notice if your problem scales as \(O(n!)\). These kinds of problems can quickly become very time consuming. Problems that involve permutations such as the travelling salesman problem are famous for their complexity.
If you’re interested on the topic, a good introduction is this excellent blog-series on Big O notation.
Image sources
Desktop image: Kahniel, CC BY-SA 4.0, via Wikimedia Commons
Laptop image: Halfwitty, CC BY-SA 4.0, via Wikimedia Commons
Server image: Markus Suhr, CC 0, via Wikimedia Commons
Quotas
Triton has quotas which limit both the space usage and number of files. The quota for your home directory is 10GB, for $WRKDIR by default is 200GB, and project directories depending on request (as of 2021). These quotas exist to avoid usage exploding without anyone noticing. If you ever need more space, just ask. We’ll either give you more or find a solution for you.
There is a inode (number of files) quota of 1 million, because scratch is not that great for too many small files. If you have too many small files, see the page on small files.
Useful commands
quota
- print your quota and usagedu -h $HOME | sort -h
: print all directories and subdirectories in your home directory, sorted by size. This lets you find out where space is being used.$HOME
can be replaced with any other directory (or left off for the current directory). Usedu -a
to list all files, not only directories.du -h --max-depth=1 $HOME | sort -h
: Similar, but only list down to--max-depth
levels.du --inodes --max-depth=1 $HOME | sort -n
: Similar, but list the number of files in the directories.
rm
removes a single file,rm -r
removes a whole directory tree. Warning: on scratch and Linux in general (unless backed up), there is no recovery from this!! Think twice before you push enter. If you have any questions, come to a garage and get help.conda clean
cleans up downloaded conda files (but not environments).
Lustre (scratch/work) quotas
Note
Before 2021-09-15, quotas worked differently, and used group IDs rather than project IDs. There were many things that could go wrong and give you “disk quota exceeded” even though there appeared to be enough space.
There are both quotas for users and projects
(/m/$dept/scratch/$project
). We use project IDs for this (see
detailed link in See Also), and our convention is that project IDs are
the same as numeric group IDs. The quota
command shows the correct
quotas (by project) by default, so there is nothing special you should
need to do.
If you want to look deeper, check the project ID with lfs
project -d {path}
and quotas with lfs quota -hp {project_id}
.
Unlike the previous situation, there should be much fewer possible quota problems.
Home directory quotas
Home directories have a quota, and unlike scratch, space for home is much more limited. We generally don’t increase home directory quotas, but we can help you move stuff to scratch for the cases that fill up your home directories (e.g. installing Python or R packages which go to home by default)
Project/archive (“Aalto Teamwork”)
The project/scratch directories use a completely different system from scratch (though quotas work similarly), even if they are visible on Triton. Quotas for these are managed through your departments or IT Services.
See also
Linux project IDs: https://lwn.net/Articles/671627/ (note that this is not exactly the same implementation as Lustre but the general idea).
Singularity Containers
A container is basically an operating system within a file: by including all the operating system support files, software inside of it can run (almost) anywhere. This is great for things like clusters, where the operating system has to be managed very conservatively yet users have all sorts of bleeding-edge needs.
The downside is that it’s another thing to understand and manage. Luckily, most of the time containers for the software already exists, and using them is not much harder than other shell scripting.
What are containers?
As stated above, the basic idea is that software is packaged into a
container which basically contains the entire operating system. This
is done via a image definition file (Dockerfile
, Singularity
definition file .def
) which is itself interesting because it
contains a script that makes the whole image automatically - which
makes it reproducible and shareable. The image itself is the data
which contains the operating system and software.
During runtime, the root file system /
is used from inside the
image and other file systems (/scratch
, /home
, etc.) can be
brought into the container through bind mounts. Effectively, the
programs in the container are run in an environment mostly defined by
the container image, but the programs can read and write specific
files in Triton - all the data you need to operate on. Typically,
e.g. the home directory comes from Triton.
This sounds complicated, but in practice it is not too hard once you
see an example and can copy the commands to run. For images managed
by Triton admins themselves, this is easy due to
singularity_wrapper
tool we have written for Triton. You can also
run singularity on triton without the wrapper, but you may need to
e.g. bind /scratch
yourself to access your data.
The hardest part of using containers is keeping track of files inside
vs outside: You specify a command that gets run inside the container
image. It mostly accesses files inside the image, but it can access
files outside if you bind-mount them in. If you ever get confused,
use singularity shell
(see below) to enter the container and see
what is going on.
About Singularity
Docker is the most commonly talked about container runtime, but most clusters use Singularity. The following table should make the reasons clear:
Docker |
Singularity |
---|---|
Designed for infrastructure deployment |
Designed for scientific computing |
Operating system service |
User application |
In practice, gives root access to whole system |
Does not give or need extra permissions to the system |
Images stored in layers in hidden operating system locations opaquely managed through some commands. |
One image is one |
Docker is still a standard image format, and there are ways to convert images between the formats. In practice, if you can use Docker, you can also use Singularity by converting your image (commands on this page) and running it by copying other commands on this page.
Singularity with Triton’s pre-created modules
Some of the Triton modules automatically activate a Singularity image.
On Triton, you just need to load the proper module. This will set
some environment variables and enable the use of
singularity_wrapper
(to see how it works, check module show
MODULE_NAME
).
While the image itself is read-only, remember that /home
, /m
,
/scratch
and /l
etc. are not. If you edit/remove files in
these locations within the image, that will happen outside the image
as well.
singularity_wrapper
is written so that when you load a module written
for a singularity image, all the important options are already handled
for you. It has three basic commands:
singularity_wrapper shell [SHELL]
- Gives user a shell within the image (specify[SHELL]
to say which shell you want).singularity_wrapper exec CMD
- Executes a program within the image.singularity_wrapper run PARAMETERS
- Runs the singularity image. What this means depends on the image in question - each image will define a “run command” which does something. If you don’t know what this is, use the first two instead.
Under the hood, singularity_wrapper
does this:
Choosing appropriate image based on module version
Binding of basic paths (
-B /l:/l
,/m:/m
,/scratch:/scratch
)Loading of system libraries within images (if needed) (e.g.
-B /lib64/nvidia:/opt/nvidia
)Setting working directory within image (if needed)
Singularity commands
This section describes using Singularity directly, with you managing the image file and running it.
Convert a Docker image to a Singularity image
If you have a Docker image, it has to be on a registry somewhere
(since they don’t exist as standalone files). You can pull to
convert it to a .sif
file (remember to change to a scratch folder
with plenty of space first):
$ cd $WRKDIR
$ singularity build IMAGE_OUTPUT.sif docker://GROUP/IMAGE_NAME:VERSION
If you are running on your own computer with Docker and Singularity both installed, you can use a local image like this (and then you need to copy it to the cluser):
$ singularity build IMAGE_OUTPUT.sif docker-daemon://LOCAL_IMAGE_NAME:VERSION
This will store the Docker layers in $HOME/.singularity/cache/
,
which can result in running out of quota in your home folder.
In a situation like this, you can then clean the cache with:
singularity cache clean
You can also use another folder for your singularity cache by setting
the SINGULARITY_CACHEDIR
-variable. For example, you can set it to
a subfolder of your WRKDIR
with:
export SINGULARITY_CACHEDIR=$WRKDIR/singularity_cache
mkdir $SINGULARITY_CACHEDIR
Create your own image
See the Singularity docs on this.
You create a Singularity definition file NAME.def
, and then:
$ singularity build IMAGE_OUTPUT.sif NAME.def
Running containers
These are the “raw” singularity commands. If you use these, you have
to configure the images and bind mounts yourself (which is done
automatically by singularity_wrapper
). If you module show
NAME
on a singularity module, you will get hints about what happens.
singularity shell IMAGE_FILE.sif
will start a shell inside of the image. This is great for understanding what the image does.singularity exec IMAGE_FILE.sif COMMAND
will run COMMAND inside of the image. This is how you would script it for batch jobs, etc.singularity run IMAGE_FILE.sif
is a lot likeexec
, but will run some pre-configured command (defined as part of the image definition). This might be useful when using a pre-made image. If you make an image executable, you can do this by running the image directly:./IMAGE_FILE.sif [COMMAND]
The extra arguments
--bind=/m,/l,/scratch
will make the import Triton data filesystems available inside of the container.$HOME
happens by default. You may want to add$PWD
for your current working directory.--nv
provides GPU access (though sometimes more is needed).
Examples
Batch script using singularity
#!/bin/bash
#SBATCH --mem=10G
#SBATCH --cpus-per-task=4
# We would run `python /path/to/software/in-image.py
$WRKDIR/my-input-file`, so instead we run this inside the image.
srun singularity exec --bind /scratch YOUR_IMAGE.sif python /path/to/software/in-image.py $WRKDIR/my-input-file
Writable container image that can be updated
Sometimes, it is too much work to completely define an image before
building it: it is more convenient to incrementally update it, just
like your own computer. You can make a writeable image directory using
singularity build --sandbox
and then when you run it you can make permanent
changes to it by running with singularity [run|exec|shell]
--writeable
. You could, for example, pull a Ubuntu image and
then slowly install things in it.
But note these disadvantages:
The image isn’t reproducible: you don’t have the definition file to make it, so if it gets messed up you can’t go back. Being able to delete and reproduce is very useful.
There isn’t an efficient, single-file image: instead, there are tens of thousands of files in a directory. You get the problems of many small files. If you run this many times, use
singularity build SINGLE_FILE.sif WRITEABLE_DIRECTORY_IMAGE/
to convert it to a single file.
MPI in singularity
The Serpent code is a Hybrid MPI/OpenMP particle following code, and can be installed into a container using the definition file sss2.def, which creates a container based on Ubuntu v. 20.04. In the build process, Singularity clones the Serpent source code, installs the required compilers and libraries, including the MPI library to the container. Furthermore, datafiles needed by Serpent are included in the container. Finally, a python environment with useful tools are also installed into the container. The Serpent code is compiled and the executable binaries are saved and the source code is removed.
The container can be directly used with the Triton queue system
assuming the datafiles are stored in the user home folder. The file
sss2.slurm_cmd
can be used as an example. If scratch is used, please add -B
/scratch
after “exec” in the file.
The key observations to make:
mpirun
is called in Triton, which launches multiple Singularity containers (one for each MPI task). Each container directly launches the`sss2`
-executable. Each container can run multiple OpenMP threads of Serpent.The openMPI library (v. 4.0.3) shipping with Ubuntu 20.04 seems to be compatible with the Triton module
openmpi/4.1.5
The Ubuntu MPI library binds all the threads to the same CPU. This is avoided by passing the parameter
--bind-to none
to mpirun.The infiniband is made available by the mpirun parameter
--mca btl_openib_allow_ib
.
See also
Singularity documentation: https://docs.sylabs.io/
Singularity docs on building a container: https://docs.sylabs.io/guides/latest/user-guide/build_a_container.html
Singularity documentation from Sigma2 (Norway): https://documentation.sigma2.no/software/containers.html
Small files
Millions of small files are a huge problem on any filesystem. You may think /scratch, being a fast filesystem, doesn’t have this problem, but it’s actually worse here. Lustre (scratch) as like an object store, and stores files separately from medatata. This means that each file access requires multiple different network requests, and making a lot of files brings your research (and managing the cluster) to a halt. What counts as a lot? Your default quota is 1e6 files. 1e4 for a project is not a lot. 1e6 for a single project is.
You may have been directed here because you have a lot of files. In that case, welcome to the world of big data, even if your total size isn’t that much! (it’s not just size, but difficulty of handling using normal tools) Please read this and see what you can learn, and ask us if you need help.
This page is mostly done, but specific examples could be expanded.
See also:
Data storage on the Lustre file system, especially the bottom.
Contents
The problem with small files
You know Lustre is high performance and fast. But, there is a relatively high overhead for accessing each file. Below, you can see some sample transfer rates, and you can see that total performance drops drastically when files get small. (These numbers were for the pre-2016 Lustre system, it’s better now but the same principle applies.) This isn’t just a problem when you are trying to read files, it’s also a problem when managing, moving, migrating, etc.
File size |
Net transfer rate, many files of this size |
---|---|
10GB |
1100 MB/s |
100MB |
990 MB/s |
1MB |
90MB/s |
10KB |
.9MB/s |
512B |
.04 MB/s |
Why do people make millions of small files?
We understand there reasons people make lots of files: it’s convenient. Here are some of the common problems (and alternative solutions) people may be trying to solve with lots of files.
Flat files are universal format. If you have everything in its own file, then any other program can look at any data individually. It’s convenient. This is a fast way to get started and use things.
Compatibility with other programs. Same as above.
Ability to use standard unix shell tools. Maybe your whole preprocessing pipeline is putting each piece of data in its own file and running different standard programs on it. It’s the Unix way, after all. Using filesystem as your index. Let’s say you have a program that reads/writes data which is selected by different keys. It needs to locate the data for each key separately. It’s convenient to put all of these in their own files: this takes the role of a database index, and you simply open the file with the name of the key you need. But the filesystem is not a good index.
Once you get too many files, a database is the right tool for the job. There are databases which operate as single files, so it’s actually very easy.
Concurrency: you use filesystem as the concurrency layer. You submit a bunch of jobs, each job writes data to its own file. Thus, you don’t have to worry about problems with appending to the same file/database synchronization/locking/etc. This is actually a very common reason.
This is a big one. The filesystem is the most reliable way to join the output of different jobs (for example an array job), and it’s hard to find a better strategy. It’s reasonable to keep doing this, and combine job outputs in a second stage to reduce the number of files
Safety/security: the filesystem isolates different files from each other, so if you modify one, there’s less chance of corrupting any other ones. This goes right along with the reason above.
You only access a few files at a time in your day to day work, so you never realize there’s a problem. However, when we try to manage data (migrate, move, etc), then a problem comes up.
Realize that forking processes has similar overhead. Small reads are also non-ideal, but less bad(?).
Strategies
Realize you will have to have to change you workflow. You can’t do everything with grep, sort, wc, etc. anymore. Congratulations, you have big data.
Consider right strategy for your program: a serious program should provide options for this.
For example, I’ve seen some machine learning frameworks which provide an option to compress all the input data into a single file that is optimized for reading. This is precisely designed for this type of case. You could read all the files individually, but it’ll be slower. So in this case, one should first read the documentation and see there’s a solution. One would take all the original files and make the processed input files. Then, take the original training data, package it together in one compressed archive for long-term storage. If you need to look at individual input files, you can always decompress one by one.
Split - combine - analyze
Continue like you have been doing: each (array?) job makes different output files. Then, after running, combine the outputs into one file/database. Clean up/archive the intermediate files. Use this combined DB/file to analyze the data in the long term. This is perhaps the easiest way to adapt your workflow.
HDF5: especially for numerical data, this is a good format for combining your results. It is like a filesystem within a file, you can still name your data based on different keys for individual access.
Unpack to local disk, pack to scratch when done.
Main article: Compute node local drives,
This strategy can be combined with many of the other strategies below
This strategy is especially good when your data is write-once-read-many. You package all of your original data into one convenient archive, and unpack it to the local disk when you need it. You delete it when you are done.
Use a proper database suitable for your domain (sqlite): Storing lots of small data where anything can be quickly findable and you can do computation efficiently is exactly what databases do. It can be difficult to have a general purpose database work for you, but there are a wide variety of special-purposes databases these days. Could one of them be suitable for storing the results of your computation for analysis?
Note that if you are really doing high-performance random IO, putting a database on scratch is not a good idea, and you need to think more.
Consider combining this with local disk: You can copy your pre-created database file to local disk and do all the random access you need. Delete when done. You can do modification/changes directly on scratch if you want.
key-value stores: A string key stores arbitrary data.
This is a more general database, basically. It stores arbitrary data for a certain key.
Read all data to memory.
A strategy for using many files. Combine all data into one file, read them all into memory, then do the random access in memory.
Compress them down when done.
It’s pretty obvious: when you are done with files, compress all of them into one. You have the archive and can always unpack when needed. You should especially at least do this when you are done with a project: if everyone did this, the biggest problems could be solved.
Make sure you have proper backups for large files, mutating files introduces risks!
If you do go using these strategies, make sure you don’t accidentally lose something you need. Have backups (even if it’s on scratch: backup your database files)
If you do have to keep many small flies, check the link above for lustre performance tuning.
If you have other programs that can only operate on separate files
This is a tough situation, investigate what you can do combining the strategies above. At least you can pack up when done, and possibly copying to local disk while you are accessing is a good idea.
MPI-I/O: if you are writing your own MPI programs, this can parallelize output
Specific example: HDF5 for numerical data, or some database
HDF5 is essentially a database for numerical data. You open a HDF5 file and access different data by path - the path is like a filename. There are libraries for accessing this data from all relevant programming languages.
If you have some other data that is structured, there are other databases that will work. For example, sqlite is a single-file, serverless database for relational data, and there are other similar things for time serieses or graphs.
Specific example: Unpacking to local disk
You can see examples at compute node local drives
Specific example: Key-value stores
Let’s say you have written all your own code and want an alternative to files. Instead, use a key-value database. You open one file, and store your file contents under different keys. When you need the data out, you request it by that key again. The keys take the place of filenames. Anytime you would open files, you just access from these key-value stores. You also have ways of dumping and restoring the data if you need to analyze it from different programs.
Performance tuning for small files
See here: Data storage on the Lustre file system
Triton ssh key fingerprints
ssh key fingerprints allow you to verify the server you are connecting to. The usual security model is that once you connect once, you save the key and can always be sure you are connecting to the same server from then on. To be smarter, you can actually verify the keys the first time you connect - thus, they are provided below.
You can verify SSH key fingerprints with a command like:
ssh-keygen -l -E sha256 -f <(ssh-keyscan triton.aalto.fi 2>/dev/null)
Here are the SSH key fingerprints for Triton:
256 SHA256:04Wt813WFsYjZ7KiAyo3u6RiGBelq1R19oJd2GXIAho no comment (ECDSA)
256 SHA256:1Mj2Gpf6iinwni/Yf9g/b/wToaUaOU87szzzCtibj6I no comment (ED25519)
2048 SHA256:glizQJUQKoGcN2aTtp9JtXuJjJtnrKxRD8yImE06RJQ no comment (RSA)
# triton v3:
3072 SHA256:3u8iICwjmvJ/+9YGxqqK+3r7FmrDflcgpoGl5ygtAWw login4.triton.aalto.fi (RSA)
256 SHA256:OqCehC2lbHdl8mYGI/G9vlxTwew3H3KrvxKDkwIQy9Y login4.triton.aalto.fi (ECDSA)
256 SHA256:ibL4dBsdrwRjbJCBWL1J5p/Sg4PGHWxTG6HF65yPcps login4.triton.aalto.fi (ED25519)
and the same but with md5 hashes:
256 MD5:ac:61:86:86:e1:11:29:f5:46:23:d8:25:00:8a:7b:f0 no comment (ECDSA)
256 MD5:1d:e7:c9:f6:92:a1:c0:65:10:97:d7:72:7d:4c:82:5a no comment (ED25519)
2048 MD5:a4:73:89:ae:8c:a5:ea:2a:04:76:cc:0b:6a:f7:e6:9a no comment (RSA)
# triton v3:
3072 MD5:2e:54:9f:f8:05:0e:b6:75:3a:b6:d4:88:e9:ac:1c:18 login4.triton.aalto.fi (RSA)
256 MD5:24:fc:03:f8:bc:20:ae:02:97:b4:3d:a1:97:44:f6:1f login4.triton.aalto.fi (ECDSA)
256 MD5:d0:63:0e:2c:2b:8d:59:d9:37:88:53:3d:54:b3:4e:69 login4.triton.aalto.fi (ED25519)
Or this can be copied and pasted directly into your
.ssh/authorized_keys
file:
triton.aalto.fi ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDk8MvTSB2gYZf9Y969vhMczdGSO+rNGZQhZLUGMkMduq4q+b/LpHCn/yH1JN8NWeIDt8NELdnl+/0hmk/zk7IHxtnPvNbZuAYO1T1Hh7Kk72zQFOESHqmbYcPH5SDf12XfNYJ6cQIqHRaF4QT483+f9fvUlp7E+MKQlr3+NreKm4AHdTcHjqW75r1Mh/z0q9Qoqdgn3gDCzmN6+Y0aGyf4wICMJlKUBQP0muqSfYWX43StaPh+hoOQFYOiK1jOVEBY/HFXOuDzgCCG2b9qWhTrA3svcSKK4E6X76sXOR+8FTbC7u9xnLgm+903+zsGfsEQY2eNXfR7YChNxz4y5ASf
triton.aalto.fi ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBAZvw6Bgs+cPGFjwqMABAGC+cG2bBYR69+Hc5ChxQhwNwCW1zCg6w/pAerbr+A6IzJDx8uN03bcTZj+xzLH2kLE=
triton.aalto.fi ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDumqy+fbEwTOtyVlPGqzS/k4i/hJ8L+kUDf6MpWO1OI
# triton v3
login4.triton.aalto.fi ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCdoiR+g/NcJJ2MufNgZapo9x90KtK7SHTzlEscsCELwM8mVrcyB1bw4dVxl2V52p8NOyCCgNnsCMPsVHZ9xXGrmiULQQ7Caw/7zKG7LAgo3mh0HZ/5Vl9w+McMaEzP5ag4l7C3mOU38U+ZWXS2d+YBl3dFLikRSJFZdljLOmXPe0j3vRvBcPWz7i6ftgMd9CKkBsBxJW8GPWzBIIU3wkhGpGhJlIpivu19JZ7wCCELD68qnJCweWxgnB0xvpep1mdYgkaZXRUnLDyStQuWzN9UpfUhY/lpmWs+xWHoCk4B1FSSoLobZv0LQXG55eKzsPQg+avURg4nZksm9j2VvCH+581HuExSSIs60zNHIHTfZARI0sFi5Ygsf5a+cUE//SmjBdTcp+zLzH6cE/Kt1DKcz77o36F1Jd86hhLBJjkPyd6Z7+dMbrxXqDU9JYjnrTcrblTjdbnCllcIpvfDbtAQbo7L3mcLhKGgvrWlznthrctI2wWcfwFaV9xukspe048=
login4.triton.aalto.fi ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBLF3oitopSKxSNvugA8CWEFwjsrMCEejPgXODoHTbPWo03wW2I2b87Or/g30uppTragZvt6V+7D886FOxaHdEgU=
login4.triton.aalto.fi ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEiAKYEcl1Dfo/FKfQVXvtEDhP7sywCld6H27v4tl6uX
There is also a page for ssh host keys for the Aalto shell servers kosh, lyta, brute, force
Storage
See also
These pages have details that go beyond this page:
Data storage - basic prerequisite tutorial
Remote access to data - basic prerequisite tutorial
This pages gives an overview of more advanced storage topics. You should read the storage and remote data tutorials first.
Checklist
Do any of these apply to you? If so, consider your situation and ask us for help!
If you have been sent this checklist because your jobs may be having a lot of IO, don’t worry. It’s not necessarily a problem but please go through this checklist and let us know what applies to you so we can give some recommendations.
Many small files being accessed in jobs (hundred or more).
Files with extremely random access, in particular databases or database-like things (hdf5).
Files being read over and over again. Alternatives: copy to local disks, or read once and store in memory.
Number of files growing, for example all your runs have separate input files, output files, Slurm output files, and you have many runs.
Constantly logging to certain files, writing to files from many parallel jobs at the same time.
Reading from single files from many parallel jobs or threads at the same time.
Is all your IO concentrated at one point, or spread out over the whole job?
( and if we’ve asked you specifically about your jobs, could you also describe what kind of job it is, the type of disk read and write happens, and in what kinds of pattern? Many small files, a few large ones, reading same files over and over, etc. How’s it spread out across jobs? )
If you think your IO may have bad patterns or even you just want to talk to make sure, let one of the Triton staff know or submit an issue in the issue tracker.
Checking your jobs’ IO usage
You can check the total disk read and write of your past jobs using:
# All your recent jobs:
sacct -o jobid%10,user%8,jobname%10,NodeList,MaxDiskRead,MaxDiskWrite -u $USER
# A single jobid
sacct -o jobid%10,user%8,jobname%10,NodeList,MaxDiskRead,MaxDiskWrite -j $jobid
These statistics are calculated on the whole node and thus include IO caused by other jobs on the same server while your job is running.
More advanced tool are being tested and will be available once they are finished.
Loading data for machine learning
As we’ve said before, modern GPUs are super data-hungry when used for machine learning. If you try to open many files to feed it the data, “you’re going to have a bad time”. Luckily, different packages have different solutions to the problem.
In general, at least try to combine all of your input data into some sort of single file that can be read in sequence.
Try to do the least amount of work possible in the core training loops: any CPU usage, print, logging, preprocessing, postprocessing, etc. reduces the amount of time the GPU is working unless you do it properly (Amdhal’s law).
Tensorflow: data input pipelines
PyTorch: data loaders
(more coming later)
Remote workflows at Aalto
Note
The more specific remote access instructions for scicomp is at Remote Access (recent email had duplicate links to this page). This page explains the options, including other systems.
Video
Watch this in video form (winter 2022 kickstart)
How can you work from home? For that matter, how can you work on more than your desktop/laptop while at work? There are many options which trade off between graphical interfaces and more power. Read more for details.
You have most likely created your own workflow to analyse data at Aalto and most likely you are using a dedicated desktop workstation in Otaniemi. However, with increased mobility of working conditions and recent global events that recommend tele-working, you might be asking yourself: “how do I stop using my workstation at the dept, and get analysis/figures/papers done from home?”.
The data analysis workflows from remote might not be familiar to everyone. We list here few possible cases, this page will expand according to the needs and requests of the users.
What’s your style?
If you need the most power or flexibility, use Triton for your data storage and computation. To get started, you can use Jupyter (4) and VDI (3) which are good for developing and prototyping. Then to scale up, you can use the Triton options: 6, 7, 8 which have access to the same data. (Triton account required for 4-8).
If you need simple applications with a graphical interface, try 3 (VDI).
If you use your own laptop/desktop (1, 2), then it’s good for getting started but you have to copy your data and code back and forth once you need to scale up.

< Overview of all the computing options at Aalto University >
Summary table for remote data analysis workflows
Good for data security: 3, 4, 5, 6, 7
Good for prototyping, working on the go, doing tests, interactive work: 1, 2, 3, 4, 5
Shares Triton data (e.g. scratch folders): 3, 4, 5, 6, 7
Easy to scale up, shares software, data, etc: 4, 5, 6, 7
Largest resources available 7 (medium: 6)
Workflow |
Pros |
Cons |
Recommendation |
Triton data Y/N |
---|---|---|---|---|
|
Can work from anywhere. Does not require internet connection. You are in control. |
Not good for personal or confidential data. Computing resources might not be enough. Accessing large data remotely stored at Aalto might be problematic - you will end up having to copy a lot. You have to manage software yourself. |
Excellent for prototyping, working on the go, doing tests, interactive work (e.g. making figures). Don’t use it with large data or confidential / personal data. |
N |
|
Same as above, plus same tools available as Aalto employer. |
Same as above. |
Same as above. |
N |
|
Computing happens on remote. Data access happens on remote, so it is more secure. |
Computing resources are limited. |
Excellent for prototyping, working on the go, doing tests, interactive work (e.g. making figures). More secure access to data. |
Y |
|
Cloud based - resume work from anywhere. Includes command line (#6) and batch (#7) easily. Same data as seen on Triton (/scratch/dept/ and /work/ folders) |
Jupyter can become a mess if you aren’t careful. You need to plan to scale up with #7 eventually, once your needs increase. |
Excellent for prototyping, working on the go, doing tests, interactive work (e.g. making figures). Secure access to data. Use if you know you need to switch to batch jobs eventually (7). |
Y |
|
Graphical programs. |
Lost once your internet connection dies, needs fast internet connection. |
If you need specific graphical applications which are only on Triton. |
Y |
|
Works from anywhere. Can get lots of resources for a short time. |
Limited time limits, must be used manually. |
A general workhorse once you get comfortable with shell - many people work here + #7. |
Y |
|
Largest resources, bulk computing |
Need to script your computation |
When you have the largest computational needs. |
Y |
|
Similar to #7 but at CSC |
Similar to #7 |
Similar to #7 |
N |
1. Own laptop/desktop computer
Description: Here you are the administrator. You might be working from a cafe with your own laptop, or from home with a desktop. You should be able to install any tool you need. As an Aalto employer you get access to many nice commercial tools for your private computers. Visit: https://download.aalto.fi/index-en.html and https://aalto.onthehub.com/ for some options.
Pros: Computing freedom! You can work anywhere, you can work when there is no internet connection, you do not share the computing resources with other users so you can fully use the power of your computer.
Cons: If you work with personal or confidential data, the chances of a data breach increase significantly, especially if you work from public spaces. Even if you encrypt your hard disks (links:https://www.aalto.fi/en/cyber-security-hub-under-construction/aalto-it-securitys-top-10-tips-for-daily-activities ) and even if you are careful, you might be forgetting to lock your computer or somebody behind you might see which password you type. Furthermore, personal computers have limited resources when it comes to RAM/CPUs/GPUs. When you need to scale up your analysis, you want to move it to an HPC cluster, rather than leaving scripts running for days. Finally, although you can connect your Aalto folders to your laptop (link Remote Access and Remote access to data), when the data size is too big, it is very inefficient to analyse large datasets over the internet.
Recommendation: Own computer is excellent for prototyping data analysis scripts, working on the go, doing tests or new developments. You shouldn’t use this option if you are working with personal data or with other confidential data. You shouldn’t use this option if your computational needs are much bigger.
2. Aalto laptop
Description: As an Aalto employer, you are usually provided with a desktop workstation or with an Aalto laptop. With an Aalto laptop you can apply for administrator rights (link to the form) and basically everything you have read for option 1 above is valid also in this case. See “Aalto {Linux|Mac|Windows}” on scicomp’s Aalto section at https://scicomp.aalto.fi/aalto/.
Pros/Cons/Recommendation: see option 1 above. But, when on Aalto networks, you have easier access to Aalto data storage systems.
3. Remote virtual machine with VDI
Description: You might be working with very large datasets or with confidential/personal data, so that you cannot or do not want to copy the data to your local computer. Sometimes you use many computers, but would like to connect to “the same computer” from remote where a longer analysis script might be crunching numbers. Aalto has a solution called VDI https://vdi.aalto.fi (description at aalto.fi) where you can get access to a dedicated virtual machine from remote within the web browser. Once logged in, you can pick if you prefer Aalto Linux or Aalto Windows, and then you see the same interface that you would see if you logged in from an Aalto dedicated workstation. To access Triton data from the Linux one, use the path /m/{dept}/scratch/ (just like Aalto desktops).
Pros: The computing processes are not going to run on your local computer, computing happens on remote which means that you can close your internet connection, have a break, and resume the work where you left it. There is no need to copy the data locally as all data stays on remote and is accessed as if it was a desktop computer from the campus.
Cons: VDI machines have a limited computing power (2 CPUs, 8GB of RAM). So they are great for small prototyping, but for a large scale computation you might want to consider Aalto Triton HPC cluster. The VDI session is not kept alive forever. If you close the connection you can still resume the same session within 24h, after that you are automatically logged out to free resources for others. If you have a script that needs more than 24h, you might want to consider Aalto Triton HPC.
Recommendation: VDI is excellent when you need a graphic interactive session and access to large data or to personal/confidential data without the risks of data breach. Use VDI for small analysis or interactive development, we do not recommend it when the executing time of your scripts starts to be bigger than a 7 hours working day.
4. Aalto Jupyterhub
Description: Jupyter notebooks are a way of interactive, web-based computing: instead of either scripts or interactive shells, the notebooks allow you to see a whole script + output and experiment interactively and visually. They are good for developing and testing things, but once things work and you need to scale up, it is best to put your code into proper programs. Triton’s JupyterHub is available at https://jupyter.triton.aalto.fi . Read more about it at: https://scicomp.aalto.fi/triton/apps/jupyter.html. Triton account required.
Pros: JupyterHub it has similar advantages than #4, although data and code are accessed through the JupyterHub interface. In addition, things can stay running in the cloud. Although it can be used with R or Matlab, Python users will most likely find this to be a very familiar and comfortable prototyping environment. Similar to the VDI case, you can resume workflow (there are sessions of different lengths). You also also access Triton shell and batch (#6, #7) in the Jupyter interface, and it’s easy to scale up and use them all together.
Cons: You are limited to the Jupyter interface (but you can upload/download data, and integrate with many other things). Jupyter can become a mess if you aren’t careful. Computationally, an instance will always have limited CPUs and memory. Once you need more CPU/RAM, look into options #6 and #7 - they work seamlessly with the same data, software, etc.
Recommendation: Good for exploration and prototyping, access to large dataset, access to confidential/personal data. For more computational needs, be ready to switch to batch jobs (#7) once you are done prototyping.
5. Interactive graphical session on Triton HPC
Description: Sometimes what you can achieve with your own laptop or with VDI is not enough when it comes to computing resources. However, your workflow does not yet allow you to go fully automatic as you still need to manually interact with the analysis process (e.g. point-click analysis interfaces, doing development work, making figures, etc). An option is to connect to triton.aalto.fi with a graphical interface. This is usually done with ssh -X triton.aalto.fi. For example you can do it from a terminal within a VDI Linux session. Once connected to the triton log-in node, you can then request a dedicated interactive node with command sinteractive
, and you can also specify the amount of CPU or RAM you need (link to sinteractive help page). Triton account required.
Pros: This is similar to the VDI case above (#3) without the computing limitation imposed by VDI.
Cons: If you connect from triton.aalto.fi from your own desktop/laptop, your internet connection might be limiting the speed of the graphical session making it very difficult to use graphical IDEs or other tools. Move to VDI, which optimises how the images are transferred over the internet. Sinteractive sessions cannot last for more than 24 hours, if you need to run scripts that have high computational requirements AND long time of execution, the solution for you is to go fully non-interactive using Triton HPC with slurm (case #6)
Recommendation: This might be one of the best scenarios for working from remote with an interactive graphical session. Although you cannot keep the session open for more than 24 hours, you can still work on your scripts/code/figures interactively without any limitation and without any risks of data breaches.
6. Interactive command line only session on Triton HPC/dept workstation
Description: sometimes you do not really need a graphical interface because you are running interactively scripts that do not produce or need a graphical output. This is the same case as sinteractive above, but without the limitation of the 24h session. The best workflow is to: 1) connect to triton ssh triton.aalto.fi
2) start a screen/tmux session that can be detached / reattached in case you lose the internet connection or in case you need to leave the interactive script running for days 3) request a dedicated interactive terminal with command srun -p interactive --time=HH:MM:SS --mem=nnG --pty bash
(see other examples at https://scicomp.aalto.fi/triton/tut/interactive.html or https://scicomp.aalto.fi/triton/usage/gpu.html for interactive GPU) 4) get all your numbers crunched and remember to close it once you are done. Please note that, if you have a dedicated Linux workstation at a department at Aalto, you can also connect to your workstation and use it as a remote computing node fully dedicated to you. The resources are limited to your workstation, but here you won’t have the time constraint or the need to queue for resources if Triton’s queue is overcrowded. Triton account required.
Pros: when you do not need a graphical interface and when you need to run something interactively for days, this is the best option: high computing resources, secure access to data, persistent interactive session.
Cons: when you request an interactive command line session you are basically submitting a slurm job. As with all jobs, you might need to wait in the queue according to the amount of resources you have requested. Furthermore, jobs cannot last more than 5 days. In general, if you have an analysis script that needs more than 5 days to operate, you might want to identify if it can be parallelized or split into sub-parts with checkpoints.
Recommendation: this is the best option when you need long-lasting computing power and large data/confidential data access with interactive input from the user. This is useful once you have your analysis pipeline/code fully developed so that you can just run the scripts in command line mode. Post processing/figure making can then happen interactively once your analysis is over.
7. Non-interactive batch computing on Triton HPC
Description: this is the case when no interactive input is needed to process your data. This is extremely useful when you are going to perform the same analysis code for hundreds of time. Please check more detailed descriptions at https://scicomp.aalto.fi/triton/index.html and if you havent, go through the tutorials https://scicomp.aalto.fi/triton/index.html#tutorials. Triton account required.
Pros: when it comes to large scale data analysis, this is the most efficient way to do it. Having a fully non-interactive workflow also makes your analysis reproducible as it does not require any human input which can sometimes be the source of errors or other irreproducible/undocumented steps.
Cons: as this is a non-interactive workflow, this is not recommended for generating figures or with graphical tools that does not allow “batch” mode operations.
Recommendation: this is the best option when you need long-lasting parallel computing power and large data/confidential data access. This is also recommended from reproducibility/replicability perspective since, by fully removing human input, the workflow can be made fully replicable.
8. Non-interactive batch HPC computing at CSC
Description: this case is similar to #7. You can read/learn more about this option at https://research.csc.fi/guides
Pro/Cons/Recommendation: see #7.
See also
Shortcuts
Scientific computing resources
SCIP – Scientific Computing in Practice courses: organized by SciComp. Including Triton kickstarts and many others
Hands-on Scientific Computing: map of important computing skills
Software Carpentry (scientific computation basics) and Code Refinery (more focused on programming techniques)
General links
CSC - Finland’s academic computing center.
FGCI user’s guide at CSC: That is a general Guide to FGCI resources. Triton is one of them.
CSC HPC guides at CSC: a Triton like cluster at CSC. Similar setup, thus examples and instructions can be useful.
Cheatsheets: Triton
Aalto Research Software Engineers
Skills to do science are different than skills to write good research code. The Aalto Research Software Engineers (AaltoRSE) provide support and mentoring to those using computing and data so that everyone can do the best possible work.
Research Software Engineers
The Aalto Research Software Engineers (RSEs) provide specialist support regarding software, computing, and data. As research becomes more digital and computer-dependent, the prerequisite knowledge grows larger and larger, and we exist to help you fill that gap.
For anything related to custom software development, computational research, data management, workflow automation, scaling-up, deployment of public previews, collaborative work, reproducible research, optimization, high-performance computing, and more, we can:
Do it for you: You need some custom technical software/solution. We do it for you, you get straight to your work.
Do it with you: We co-work with your group, teaching while we go along.
Make it reusable: You already have something, but it doesn’t work for others.
Plan your ambitions: Figure out how far you can reach in your next project or grant.
Instead of, or in addition to, hiring your own intern, postdoc, etc. to struggle with certain issues, we can help instead. We consist of experienced researchers who have broad experience with scientific computing (programming, computing, data) for our academic work, and thus can seamlessly collaborate on research projects. We can also do consultation and training. You will have more impact since your work is more reusable, open, and higher quality. We can work on existing projects or you can write us directly into your grant applications.
Service availability: Garage support is available to researchers at Aalto. We serve projects from all Aalto schools thanks to IT Services grants, but our main funding currently comes from the School of Science. For more information, see Unit information.
Contact
For a quick chat to get started with any kind of project or request any type of support, come to our daily garage, every workday online at 13:00. Or contact us by email at rse-group at aalto.fi, or fill out our request form. See requesting RSE for more.
About our services
For researchers and research groups
You program or analyze data in your daily work, and you know something is missing: your code and data is less organized, less efficient, less managed than others, and it’s affecting the quality of your work. Or maybe you don’t know how to start your project, or publish it. You’re too busy with the science to have time to focus on the computing.
To find out more or make a request, contact us.
Case study: preparation for publication
A group is about to publish a paper about a method, but their code is a bit messy. Without easy-to-use, (relatively) high-quality code, they know their impact will be minimal. They invest in a few days of RSE work in order to help adopt best practices and release their method as open source.
Case study: external grant
A PI has gotten a large external grant, and as part of that they need some software development expertise. The time frame is four months, but they can’t hire a top-quality person on an academic salary for that short time. They contact the Aalto RSE group (either before the grant, or while it is running) and use our speciality for four days per week.
Case study: improve workflow
A group of researchers all works on similar things, but independently since their backgrounds have been in science, not software development. They invite the RSE for a quick consultation to help them get set up with version control and show a more modular way to structure their code, so they can start some real collaborations, not just talking. This is the first step to more impact (and open science) from their work.
Case study: sustainability of finished projects
A project has ended and the main person who managed the code/analysis pipeline has left to continue their career somewhere else. You wish to replicate and extend the previous work, but your only starting point is a folder with hundreds of files and no clear instructions/documentation. Aalto RSEs can help you re-using and recycling previous code, document it, and extend it to make it more sustainable to be reused in future projects.
Case study: outreach and impact
ChatGPT wasn’t in the news just because it was good - it’s because it had an excellent interface for the public to test it. Developing and running these services requires a different set of skills than research, and Aalto RSEs can help to make and deploy these services.
What we do
Our RSEs may provide both mentorship and programming-as-a-service for your projects. Are you tired of research being held back by slow programming? We can help.
You can request our help for a few hours, to consult and co-work on a project. Our goal will be to primarily teaching and mentoring, to help you help yourselves in the long run. We’ll point you in the right direction and where to look next.
You can also request longer-term work as a service. This can be producing or modifying some software for you, or whatever you may need. If it’s short, it’s covered under basic funding, and if it is longer it is expected to be paid from your grants. (Need someone for a few months for your grant? We can do that.)
Note
Master’s and Bachelor’s students
The RSE service is intended for researchers, but students can be researchers if they are involved in a research project. To get started on anything longer than a short consultation, we would need to meet with your supervisor.
Short-term examples
Format could be personal work, lecture, or group seminar followed by a hands-on session, for example.
Setting up a project in version control with all the features. This also includes version control of data.
Preparing code or data for release and publication
FAIR data (findable, accessible, interoperable, reusable) - consultation and help.
Creating or automating a workflow, especially those processing data or running simulations
Optimizing some code - both for speed and/or adaptability
Efficiency storing data for intensive analysis. Data replication and management.
Making existing software more modular and reusable
Help properly using, for example, machine learning library pipelines, instead of hacking things together yourself
Setting up automatic software testing
Transforming projects from individual to collaborative - either within a group, or open source.
Generalized “code clean-up” which takes a project from development to stabilized
More involved examples
These would combine co-working, mentoring, and independent work. We go to you and work with you.
Developing or maintaining specific software, services, demos, or implementations.
Software development as a service
Software support that lasts beyond the time frame of a single student’s attention
Adding features to existing software
Contributing to some other open source software you need for your research
Paid project service
In the dedicated service, your research groups pays and we will do whatever you want (in particular the more involved examples above). Still, our model is as much co-working as consulting: we want to improve your own skills so that you can still be productive afterwards.
The research group must pay for this service, but the rate is essentially at-cost and with minimal bureaucratic overhead.
Free basic service
In order to help everyone and avoid microtransactions, departments/schools/etc can sponsor a basic service, which provides a few hours or days of close support to improve how you work (especially for the “basic examples” above.
One of our trained RSEs will work with you for a short period to begin or improve your project. The goal is not to do it for you, but to show you by example so that you can do it yourself later.
How to contact us and request help
To request a service, see the request area.
Requests are prioritized according to:
Short-term “urgent help” for as many projects as possible
Priority for projects and units which provide funding
Strategic benefit
Long-term impact to research (for example, improved skills)
Diversity and balance
For units such as departments, schools, or flagship projects
Our service is funded by departments and schools, and members of these units can receive our services free of charge for a short period of time (in accordance to the shares of funding). In addition to the basic service, researchers and group leaders can request long-term support which they pay for themselves.
By joining the Research Software Engineering service, you provide the highest-quality computational tools to your researchers, enabling the best possible research and attracting the best possible candidates. You fund a certain amount of time, and actual cost decreases when groups pay for long-term service themselves. For both short and long-term projects, our surveys indicate a significant efficiency: (researcher time saved) ≥ 5 × (time we spend).
Case study: Systematic improvements
Your department has a lot of people doing little bits of programming everywhere, but everyone is doing things alone. What if they could work together better? By joining the RSE program as a unit, your staff can get rapid help to understand tools to make their programming/data work better. After a few years, you notice a dramatic cultural shift: there is more collaboration and higher-quality work. Perhaps you already see a change in your KPIs.
Benefits
Benefits to schools/departments:
Increase the quality and efficiency of your research by providing the best possible tools and support.
Provide hands-on technical research services to your community at a higher level than basic IT (see Scicomp garage).
More societal impact, for example ChatGPT-type preview interfaces.
Help with data management, open science, FAIR data - be more competitive for funding and help get value out of your unit’s data.
You will be able to set priorities for your funding, for example do you focus on a certain strategy, wide variety of projects, high-impact project, etc.
Benefits to groups:
Receive staff/on-call software development expertise within your group, without having to make a separate hire, and at less than a full time equivalent. We don’t disappear right after your project.
Instead of just one person, you have the resources of our whole team available to you.
Your researchers focus on their science while improving their computational skills by co-working with us.
How to join
The RSE program is a part of Aalto Science-IT (Aalto Scientific Computing), so is integrated to our computing and data management infrastructure and training programs. You don’t just get a service, but a whole community of experts. We can seamlessly work with existing technical services within your department for even more knowledge transfer - if it matches their mission, your existing technical services can even join us directly.
In practice, joining us means that you contribute a certain amount of funding, which allows us to hire more staff (combined with the other departments), to provide a certain amount of time to research groups in your unit. This is easy with basic funding, but we can also use Halli to work with project funding.
If you would like to join contact Richard Darst or rse-group at aalto.fi.
Current sponsoring units
See Unit information
Project portfolio
This page lists examples of projects which we have done. As of early 2024, our internal project numbers are in the 200s.
Summary table

Example range of projects we do. We sometimes do things outside of this table, too.
Software publishing (M)
A CS doctoral researcher’s paper had code released along with it - with seven PDF pages of installation instructions, five pages of pre-processing instructions, and fifteen pages on how to run it. This code was effectively un-reusable, meaning that the potential for impact was much lower than otherwise.
Aalto RSE helped to transform this analysis into a standard R package that could be installed using standard tools and run using a Snakemake, a workflow automation tool. Other researchers - including future researchers in the same group - could reuse the tool for their science. Time spent: 3 weeks. Benefit: One paper’s results become reusable (both internally and externally).
cor:ona data collection platform (L)
Cor:ona (Comparison of rhythms: old vs. new activity) was research study studying personal habits during the transition between remote work and post-remote-work. For this study to be successful, a platform to integrate survey and smart device data had to be created within a one-month time frame.
Aalto RSE worked with the researcher to do a complete and quick ethical review and build the platform. Unlike a hired software developer, our staff already knows the research methods and can work much faster - and stays around providing years of support with the post-processing whenever it is needed. [Source code on Github]. Time spent: ~1 month. Benefits: one study and multiple papers that could not otherwise exist.
Periodic table of quantum force fields (S)
A researcher wanted to create a website that could find quantum mechanical force fields (≈models). The researcher dropped by our daily scientific computing garage for advice, and we discussed options - by working with us, the path could be greatly simplified to a static site. We found a suitable open-source starting point, adjusted it to work for the needed purpose, and provided it to them for future work by the next day. The researcher has been able to carry on with the project independently. Time spent: 0.5 day, time saved: 4 days + simpler implementation.
Finnish Center for Artificial Intelligence (dedicated)
The Finnish Center for AI (FCAI) aims for its research to have an impact in the world, and to do that, its software must be reusable. They have identified that as a bottleneck, and thus provides 5 years of funding for Aalto RSE to hire a research software engineer dedicated to FCAI projects. This person works along with all other RSEs in the team, so that FCAI has far more resources than a single hire could do themselves.
Business Finland project (M)
A research group had gotten Business Finland funds to develop an idea into a product, but were still working within Aalto. They needed software development expertise to start off quickly. They were large enough that they needed a dedicated developer, but our initial work could allow them to start sooner and lay a good groundwork for the developer they hired later.
Debugging and Parallelisation (S)
A researcher had a huge dataset to run an analysis on. Sequential analysis would have been infeasible and they wanted to run it in parallel. They tried to implement it themselves but got stuck, so they came to garage, where an RSE was able to help them to modify their code allowing them to parallelize a lot of the work and perform the analysis. The resulting work got published in Fuel.
Introduction to Julia course (M)
Julia is a relatively new programming language that has found many users in certain fields. A professor taught an undergraduate course using Julia, but there were not sufficient introductory resources to prepare students for the course, nor any other resources to prepare them. Aalto RSE found an open-source course prepared by CSC (Finland’s national scientific computing center), improved it to handle the things needed by the undergraduate course, and successfully taught it on demand. All course material is open source, so that others may also use it. Time spent: ~1 month. Benefit: Course given twice, undergraduate course made better, open material produced, internal Julia expertise
Releasing an open-source Github-based book (S)
A researcher had prepared the start of an open-source book and needed help and advice in releasing it as an open project. Aalto RSE helped with the technical setup to host the book on Github, the basics of Git usage, and creating a continuous integration system that would rebuild the book on every change. This allowed the book to both be fully open-source and to accept contributions from others. Aalto RSE also used its connections to Research Services to discuss the intellectual property aspects and how it might affect the possibility for future publication. Time spent: <1 day. Benefit: Open book and community project.
Releasing a microscope control code (S)
A researcher had created a code in Python to control a physical measurement device. This code could be useful to others, but had to be packaged and released. Aalto RSE helped to clean and release the code. Time spent: 1 day. Time saved: 1 month.
“Programming parallel supercomputers” course (M)
The “Programming parallel supercomputers” course, as the name says, gives students a first experience with HPC work. It can be difficult to find teaching assistants capable of giving the exercises a deep-enough check - in addition to confirming they follow best practices on the cluster. There is also a secondary effect of making sure students see best practices in research software (development, documentation, etc.), which can often be left behind in academic courses. Aalto RSE plays an important role in this course by bridging the technology with the teaching.
Aalto Gitlab improvements (M)
Aalto University’s Gitlab needed some scripting for management tasks. While not exactly in our scope, we were the logical team to take a look (as opposed to hiring outside consultants, especially since we could better fit in with an incremental development schedule and longer-term support). We talked with the system owners, refined the tasks, understood GitLab documentation, created the necessary scripts and improvements, handed them off to the sysadmins for production, and helped to understand tasks which should be done at another level. Time spent: 1 week. Benefit: improved service for Aalto University, significant cost savings. This type of project would be available for other internal service teams, assuming availability.
How to get started
Contact us as mentioned above, or read here for more details.
Requesting RSE support
You can contact us regardless of how small your issue is - or even if you would like to know if we could help your project. At least, we can point you in the right direction.
Quick Consultations
We recommend you come to our daily garage sessions for a short chat. There are no reservations, and this is the online equivalent of dropping by our office to say hi.
Contact
We recommend you come to our daily garage sessions (see above), rather than send email (or to come by after you send the email). We almost always have more questions and want to chat, so that responding to email is slow.
Our email is rse-group@aalto.fi (the Triton email address scicomp@aalto.fi also gets to us). You can also use the structured request form (“Research Software Engineer request”). This guides you through some of our initial questions, but goes to the same place as email and everything is read by a human anyway.
Next steps
See Starting a project with us for more info.
Starting a project with us
This page is mostly focused on how long-term scheduled projects, which are funded by the research groups themselves, work. Long-term projects are scheduled by fraction of full-time-equivalent (FTE) over weeks or months.
For short-term code review tasks, come to any of our garage sessions and we will immediately take a look.
Types of service
Long-term service deals with jobs that last months, and are scheduled in terms of FTE percentage over months. This is often directly as salary from some grant, as a researcher would be.
Medium-term service deal with jobs scheduled in days. This is mostly funded by basic funding from the member units.
Short-term usually consists of support at one of our garages or a few hours of work. This is generally free (paid by unit basic funding).
Beginning
To actually make a request for support, see Requesting RSE support.
Initial meeting
First, you can expect an quick initial meeting between the researchers and RSEs. Depending on the size and complexity of the project, there may be several to find the right RSE and ensure that we can do a good job helping you.
What scientific background knowledge is needed? How long does it take to get started?
What type of contribution will the RSE make (see next section)? For purposes of scientific integrity, consider if this reaches the point of scientific authorship (see bottom).
Researchers: provide access to code, documentation, and relevant scientific background in advance, so that they can be browsed. The more we know in advance, the better we can estimate the time required and how to best help you.
How do you manage your data? To map things out, consider this one-page data management plan table.
Final outputs, location, publication.
Time frame and schedule flexibility.
What we can accomplish
It is very important to consider what the practical outcome of the project will be, because different researchers have very different needs. Together, we will think about these questions:
What’s the object of focus
Software
Data
Workflows
What is accomplished?
Create a brand-new product based on scientific specification. Is this done in an agile way (continuous feedback) or is the exact result known?
Improve some existing workflow or software, possibly drastically.
Improve some other project, primarily maintained by someone else.
Prepare a project for publication, release, or being used by more people.
Future plan
Primarily teach via example, so that the researcher can fully continue developing the project themselves.
Provide a finished product, which won’t need updates later
Provide a product that will be continually maintained by specialists (RSEs or similar - us?).
Scheduling and planning
RSEs will be assigned based on discussion between the researchers, RSEs, and Aalto Scientific Computing (the RSE group). Your agreement is with the RSE group, so your RSEs may change if there is a need (even though we’ll try to avoid this).
We will work with you to give a good view of how long we take something will take and any risks (as in, what if it turns out to not be possible?) We can’t promise specific results in a specific time (no one can), but we do try to give the best estimates we can. This planning includes any buffer and backup plans.
It may take some time to fit your project into our schedule (of course this also depends on the urgency.) We realizes that your schedule is also uncertain, but we hope that you can find time to work with us once we start, since otherwise we may move on and requeue your project.
If we schedule a project but lose contact with you (no responses to our messages), we’ll assume you are busy with other things and may re-add the project to the queue, and we’ll need to find a new time in the schedule. Please let us know if you don’t have time, we understand the busyness of research.
A project doesn’t have to be done “all at once” but can be interleaved with your own work schedule.
Costs and time tracking
We track the time we spend and record it to your project.
Getting started
Version control
One can hardly do development work without using a good version control system. Our first step will be help you start using a version control system, if you are not yet using one, or if you are ensure you are using it optimally. If you don’t have a preference, we’ll recommend git and GitHub / Aalto Gitlab.
Research background
If some understanding of the scientific background wasn’t important, you might be hiring a software developer instead. Expect us to take some time to understand the science.
Understanding existing code
Also expect that, if there is any existing code, it will take some time to understand for a new person. Also, there is likely to be a period of refactoring to improve the existing code, where it seems like not much is getting done. This is a necessary step in investing for the future.
During the project
Our RSE will most likely want to go work with you, in your physical location (well, after corona-time), a lot of the time. It would be good to arrange a desk area as close as possible to existing researchers. “Mobile-space” but close is better than fixed but further.
Our goal isn’t just to provide a service, but to teach your group how to work better yourselves after the project.
Software quality and testing
Software which is untested can hardly be considered scientific. We will work with you to set up a automatic testing framework and other good practices so that you can ensure software quality, even after the project. This also ensures faster and more accurate development in the future. We’ll teach you how to maintain this going forward. This is in proportion to the complexity of the project and need.
We also pay particular attention to the maintenance burden of software: you’ll be using software much longer than you write it. We aim for simple, reliable strategies rather than the fanciest things right now.
After the project
We don’t want to drop support right after the project (that’s why you work with us, not an external software developer). Still, we have finite resources and can’t fund work on one project from another, so can’t do everything for everyone. You can expect us to try to passively keep supporting you for during the “daily garage” time as best we can.
If your department or unit provides basic funding (see the implementation plan), then long-term service is included, and this has no limits. However, this is shared among everyone in your unit, and focused on strategically support that helps many people.
Tracking scientific benefits
We need to record the benefits of this service:
Researcher time saved
Computer time saved
Number of papers supported
Software released or contributed to
Open science outcomes (e.g. open software, data management)
New work made possible (e.g. grant or project wouldn’t have been possible)
Qualitative experience: increased satisfaction, educational outcomes, etc.
Releasing the software
A key goal of our support is releasing the software for broader use in the community (open science). Ideally, this will be a continual process (continue releasing as development goes forward), but we can prepare you for a first release later on, too.
We recognize the need to maintain a competitive advantage for your own work, but at the same time, if your work is not reproducible, it’s not science. We’ll work with you to find the right balance, but a common strategy is some core is open, while your actual analysis scripts which make use of that core are released with your articles.
Academic credit
Our RSEs do creative scientific work on your projects, which (depending on scope) can rise to the level of scientific authorship. This should be discussed early in the project.
The software-based scientific creativity can be different than what is published in your articles: in this case, it can make sense to release the software separately.
This is not to say that RSEs who work on a project should always be authors, but it should be considered at the start. See TENK guidelines on research integrity (authorship section).
A contributing that is significant enough to become scientific novelty and such that the programmer must take responsibility for the outcome of the work usually rises to the level of co-authorship.
It is OK to consider the code authorship as a separate output from the scientific ideas, and the RSE can help properly publish the code so that it is citeable separately from the paper.
Acknowledging us
You can acknowledge us as “Aalto Research Software Engineering service” or “Aalto RSE”. In papers/presentations, please acknowledge us if we significantly contribute to your work.
When talking with/presenting to your colleagues, please do talk about our services and its benefits. Our link is https://scicomp.aalto.fi/rse/ . Word of mouth is the best way to ensure our funding to continue to serve you.
See also
UCL RSE group processes: That page heavily inspired this page. Broadly, most of what you read there also applies to us.
For group leaders
You, or someone in your group, has requested Research Software Engineer services for one of your group’s projects. This service provides specialist support for software, data, and open science so that you can focus on the science that is interesting to you. You probably have some questions about what this is, and this page will answer those practical questions. For researchers using our services, also see Starting a project with us.
How it is funded
There are two funding strategies:
Short term (a few weeks or less) is funded by your department, if you are in one of the sponsoring units. You don’t need to do anything special.
Longer term (month or more) is funded from your own projects. See the information for grant applicants, it is also relevant if you already have funding you want to use.
You can use our services for both a specific project, or generally have us around on retainer to support all of your projects (for example, 20% time for a year). If you are applying for a new grant, see For grant applicants.
Access to data and tools
Our goal is not to come in, wave our hands, and leave you with something unusable. Instead, we want to come in and set you up to work yourself in the future. Thus, (if it’s necessary) we’ll want the same access to your group’s data/workspace/tools as you have.
This access is removed after the project is finished. We will try to remember this, but sometimes projects drag on with no clear ending (or you want long-term consultation), so you should also pay attention to this. Out of principle (+ policies), we access the data the same as a normal researcher would.
NDAs, intellectual property, etc.
The RSE staff are Aalto employees and are automatically bound to confidentiality, and have signed the same extra confidentiality agreement that Aalto IT system administrators have, and are similarly vetted.
Using our services doesn’t affect your intellectual property right any more than another employee working on the project will. This is service-for-pay, so you get all rights. However, our RSEs expect to be acknowledged according to good scientific practice (see Starting a project with us).
For grant applicants
Warning
Grant applicants, if you are planning to use Aalto Research Software Engineers service, feel free to drop by SciComp garage for a chat, contact us at rse-group at aalto.fi, or fill out our request form.
This page is currently (2024-01) our best understanding of what is possible. However, we are still exploring what works and doesn’t, so contact us early so we can work out bugs together. Please send corrections to us.
If you’ve decided you would like to use the research software engineer services in your project for a long period, you might want to write it directly into your grant proposals. If written correctly, this can increase your competitiveness: your research will be better because you can use RSEs for porting/optimizing/scaling of codes, automation, data management, and open science, while concentrating the main project resources on the actual research question. We can do those listed things much faster than most researchers.
If you don’t know if you need our services, or need a consultation of what is even possible computationally, let us know, too!
Summary
We can serve as a specialist to complement the researchers in your project, which will make your grant more competitive.
Plan on “Staff Scientist” salary level for at least a few months when convenient for you. (can be as low as 10% time spread out over a longer period of time).
We are written is as a normal staff, since we are. Don’t mention subcontracting or purchasing or things that imply that (this can make funders ask questions).
Contact us for more exact costs and our availability.
Funding options
Short-term services, less than one month per research group, are funded by various departments and schools and free to the users (part of the “research environment” services). Longer term service should be funded by projects - either an external grant or basic funds. There are two ways to write this into a project proposal:
As a research salary, just like other salaries on your project. This has fewer limits, but is less flexible because we need to go through HR and financial planning.
As a internal changing/purchased service, like usage of different infrastructures. This is flexible, but not compatible with some funders. It should work well with internal, basic funding.
Don’t mention subcontracting or purchasing in your grant text unless it really has to be organized that way. Make us appear as normal employees, since we are.
(1) Funding RSE salary
In this option, your grant directly pays the salary of an RSE from our team. To a funder, this appears the same as hiring a researcher, so is compatible with many types of grants. Some considerations:
This only works internally in Aalto.
Contact us for salary levels (it is roughly staff scientist) and availability.
Tell your controller this salary level and duration. Your controller will compute the necessary overhead and tell you if it is possible. (You should tell you financial staff/HR/etc. that the salary will be used for someone in the School of Science (SCI-common) and ensure that this is fine.)
Finance/HR will set up the Halli system so that we can bill our working hours directly to your project, based on actual time we work.
Realistically, we can spend up to about 80-90% time in a month on a single project (but you must make sure we have time first!).
We bill only the actual time relevant to your project, so while the costs are higher, in the end we are much more efficient than typical researchers who have many other tasks going on.
(2) Purchasing RSE services
Contact us with your needs, and we can give you an estimated price and time required. We can provide the services distributed over a time period that is relevant to you.
Warning: many funders (for example, the Academy of Finland or EU) don’t like for this to be used in their grants. If you do include this in a grant, carefully consult with grant/financial services to make sure this is possible.
In theory, we can serve groups outside of Aalto, but overheads are quite large. We are working on a RSE network within Finland so that we you can efficiently get RSE services no matter where you are.
General grant considerations
You can find general boilerplate text to include in your proposals Boilerplate text for grant proposals, but you can read below to build it in even more.
Data Management / Open Science are big deals among some funders right now, and research engineers are perfect for helping with these things because they are experts in the associated technical challenges. The RSE service can help in the societal impact sections: your outputs will be more ready to be reused by society. You could, for example, promise to deliver more types of outputs that aren’t exactly novel science but help society to use your results (e.g. databases, interactive visualisations, etc.).
Make sure you mention the general Science-IT infrastructure in the “research environment” section, i.e., the basic service provided by Aalto. You can copy something from the boilerplate text (first link in this section).
Specific funders
Academy of Finland
This applies to most general research grants, from the general terms and conditions. Funding may be used to cover costs related to the research plan or action plan. The research site must fund basic project facilities - which is the case at Aalto for basic RSE services.
Interesting terms from the Academy: it urges research data and methods to be freely available. 6.2.2: “Research data and material produced with Academy funding in research projects and research infrastructure projects must be made freely available as soon as possible after the research results have been published.” We are experts in exactly this for computational and data sciences.
As a RSE salary:
Contact us for the salary level which you should budget and our availability. Your controller will help you write this into the budget.
“Salaries, fees and indirect employee costs” may be included in Academy projects. These may go to research software engineers, which to the academy appear equivalent to “normal researchers”. The RSEs are researchers.
Write in a Research Software Engineer as a salary for a set number of months. You may specify a name as N.N., or contact us for a name to include. We do not promise any one person, but we will work with you as much as possible. Contact us for costs per person and we will put you in touch with our controllers. You can also contact us to discuss how much effort you may need.
Note that “We recommend that they be hired for a period of employment no shorter than the funding period, unless a shorter contract is necessary for special reasons dictated by the implementation of the research plan or action plan (or equivalent). Short-term research, studies or other assignments may also be carried out in the form of outsourced services.” So, consider this in justifying the research plan.
Don’t call this subcontracting or purchasing. It’s normal internal salary.
As a service purchase:
Warning
Our latest information indicates that internal billing (this service purchase) is not really possible for Academy grants. You must use “As a RSE salary” above.
Please contact us for general costs, and how many person-months you can get for a given price (it is roughly on “Staff Scientist” level). Since estimating the amount of effort needed is difficult, contact us and we can help you prepare with the help of our controllers.
The research site should provide “basic project facilities”, which Aalto does. Justify the extra purchase as beyond the basics.
Maximum amount: We recommend you include no more than XXXXX as a service purchase. Please see LINK (login required) for our prices, when paid via external funding.
Justification for funding (include in proposal): “Technical specialist work to ensure scientific and societal impact outputs follow best practices in software development and research data management practices, so that they can be of greatest possible benefit to society.”
Flexibility: we could flexibly invoice as needed for your project. You don’t have to decide the time period in advance (only follow your submitted budget), and different RSEs can work on different parts of the problem, so you always have the best person for the job.
European Commission grants
Internal billing is (for practical purposes) not possible for EC grants. Use the “RSE salary” method (and don’t call it subcontracting or purchasing - we are normal salary).
RSE financial practicalities
Let’s say you know what we do and have funding and would like to send it our way. This page says what to do. You can read what we know about different funders, but it’s probably better to ask your controller directly if you have the funding already.
Instructions for group leaders
Please send a message such as this one to your controllers (we will tell you the relevant salary):
I am wondering what types of funding I have available to cover salary at [NNNN]€/month (Aalto internal, SCI) - do I have enough funding for [1 month / 5 days / 4 months at 25% / etc.] at this level? I would like to hire one of the Aalto Research Software Engineers for a short amount of time for a project. You can read more here:
If the answer is positive and you want to start the process, reply and include Richard Darst, the Research Software Engineer, and the RSE controller that we indicate:
The person is [USERNAME/EMAIL]. They can be added to Halli and they will directly allocate their salary to this project based on time worked, or costs can be paid by internal charging - please let us know. Please also let us know any requirements (maximum amount of time, valid months, etc.) Richard Darst (cc:ed) can answer any more questions about this.
Checklist:
Project discussed with researcher (and research software engineer, if relevant)
Initial request sent to your controller to confirm funds are available
Request for getting started sent to your controller, RSE controller, and…
Details relayed back to research software engineer
Instructions for junior researchers
Below is an example message to send to your group leader, if you need some inspiration:
Dear GROUP LEADER, as you know I have been going to the SciComp garage to get help from the Aalto Scientific Computing people. We are at the point that they would like to help more, and our group has already reached the limit of the free “research software engineer” service, which goes beyond the typical cluster support. Do you think we have a little bit of funding which would allow us to hire their services for a short period?
With a little bit of funding, we can make our work much better and faster, and I [won’t have to worry about [topic] / will learn about [topic] much faster].
You can read more about the service here: https://scicomp.aalto.fi/rse/
Checklist:
Project discussed with research software engineer
Request sent to supervisor
Instructions for department controller
Our Research Software Engineer (RSE) will work on a project for the PI according to the PI’s requests and be paid by their project. The RSE will track the time they spend and record the actual time used for the project in Halli, or salaries can be handled by internal charging.
Checklist:
Name staff (RSE) and duration of funding received.
Confirm funding conditions.
If using Halli:
RSE added to Halli by department controller (add permission for staff to record hours to the correct project).
Project number and any additional constraints (maximum hours, funding deadline, etc.) sent to the RSE, RSE lead (Richard Darst), and PI.
If using internal charging:
Arranged between department controller and RSE controller (SCI)
We’ll track all our time internally.
If an EU project or any special constraints on how we keep our records, let us know. EU projects use Halli and are described on the project administration page, heading “special projects”.
Tell us how to update this page to be more useful to others.
About research software engineers
RSE community
Do you like coding, research, and the academic environment, but want slightly more emphasis and community around the software side? Join the Aalto RSE community. You can join whatever your current position is, you don’t need to be hired as a research software engineer. There are no requirements, just networking and development. This is also a “Triton powerusers group”.
RSEs have been an essential part of science for ages, but are hardly ever recognised. We have many here at Aalto. Aalto SciComp is trying to make a community of these people. By taking part, you can:
Network with others in similar situations and discover career opportunities.
Share knowledge among ourselves (maybe have relevant training, too).
Take part in developing our services - basically, be a voice of the users.
More directly help the community by, for example, directly updating this site, helping to develop services, or teaching with us.
To join the community, see the general SciComp community page. You may want to join the Aalto RSE community mailing list, which is a general-purpose list which anyone may post to, including possibly internal job advertisements or other random discussion. Also, you should take part in the Nordic-RSE Finland chats - there is a strong Aalto presence there, and we use that as our Aalto chat time, too.
For RSE candidates and community
See also
We occasionally hire people. To get notified (of this and other similar jobs):
From time to time, job advertisements are posted on the Aalto University job portal. If you are considering Aalto, CSC also quite often has jobs open.
This blog post describes what you might want to know for applying for jobs with us.
If you are looking for jobs inside and outside of Aalto, consider following the Society-RSE job vacancies form.
If you are inside of Aalto, join the RSE community mailing list (mailing list). This will get announcements of both our jobs, events, and other research groups looking to hire a RSE skillset.
If you are in Nordics/Baltics/etc, consider joining Nordic-RSE or CodeRefinery and participating in their events. We are active in these organizations and this is a good way to learn how they work.
This page guides people into the interesting world of research software engineering: providing a view to what this career is like, and what you could do if you want to develop your skills. This isn’t what you have to know to start off. It’s a map of ideas for both before and after, not a list of requirements.
If (some of) the following apply to you, you are a good candidate:
I like the academic environment, but don’t want to focus just on making publications.
I am reasonably good at some programming concepts, and am eager to learn more. I know one language well, can shell script, and generally familiar with Linux.
I am interested in going to a scientist-developer kind of role in a company, but need more experience before I can make the transition.
Components of RSE skills
Research practices: Research is its own special thing, with special ways of working (this includes data management and open science). Research experience helps you connect to our target audience and know what works and doesn’t.
Programming and software development: Programming and general development and project management practices are important - but we must keep in mind the relatively small-scale nature of our projects. Basics are useful, enterprise-grade usually isn’t.
Open-source/open-project knowledge: We emphasize making research results reusable, and open source practices are a key way to do that.
A person coming from a research background will be probably be good at (1) but likely need to improve more in (2). Someone coming from an industry background will probably be good at (2) but need to improve in (1). (3) is very person-dependent.
Let’s not forget a final
Mentoring and teaching: As in every job, social skills are the most important aspect, since you are working closely with a wide variety of researchers.
Research practices
To get experience with this, there is a fairly clear academic career path which can provide good RSE education, especially if you look beyond producing as many papers as possible. To broaden your skills, try:
Try to get involved in a wide variety of computing, data, and software related research.
Publish datasets and software (properly) along with your papers - either separately or in a software/data paper.
Try to work on more collaborative projects (sharing code/data), rather than focusing on your own work.
Manage your data well (remember, it’s not just about the software).
Use different types of computing environments for your work, especially cluster environments (see our HPC cluster lessons).
Software development
Technical skills are an important part of what we do: computing, data, and software. Many people basic programming courses, but there are many important practices beyond that: version control, other tools, methods (Scrum, agile, etc), deployment strategies, and so on.
Don’t let “software” trick you into under-valuing other forms of skills: data managers, computational specialists, etc are all important, too.
To develop these skills, try:
Get at least minimally comfortable with the command line.
Use version control (at the right level for your project). Can you make your project a bit more professional and level up your version control?
Add a command line interface to a code.
Make a modular, reusable code.
Add automated tests, continuous integration.
Play with a new language or tool for some small project - do you have experience in both high and low level languages?
Automate your workflow to make it reproducible.
Use the best data storage methods possible.
Make a merge request / pull request to a project you want to contribute to.
CodeRefinery workshops cover most of what you need.
Look at the Zen of Scientific Computing for other ways to advance some projects up those levels.
Open source / open project knowledge
One of our most important goals is to make research reusable and more open. For computational research, the practices of open-source projects are our main toolbox, since they are often shareable and reusable by design. Don’t limit your vision to just software projects, for example Wikipedia and OpenStreetMap are open projects focused on data curation.
To develop these skills, try:
On Github, subscribe to a project of interest to you. See how it is run. (see if you find some that are large enough to use best practices and active communication, but not so large there is a flood of messages). Or, subscribe to some mailing lists of the project.
Report issues and try to help debug a project of interest to you.
Make a contribution to a project of interest to you.
Package and release one your projects…
… and see if you can get others to use it.
Help others use one of your tools.
Mentoring and teaching
The job of a RSE, at least in our vision, is as much mentoring and teaching others as it is doing things. To improve this, you could try:
Mentor younger researchers in computational tools.
Become the “local computational expert” in your group.
Teach someone about how to use a tool you use.
Help teach some relevant courses.
Motivation and demotivation, a chapter in Teaching Tech Together.
Check out and get involved in being an exercise leader/co-instructor at CodeRefinery.
Role at Aalto
At least at Aalto, you will:
Provide software development and consulting as a service, depending on demand from research groups.
Provide one-on-one research support from a software, programming, Linux, data, and infrastructure perspective: short-term projects helping researchers with specific tasks, so that the researchers gain competence to work independently.
As needed and desired, teach and provide other research support.
A typical cycle involves evaluating potential projects, meeting, formulating a work plan, co-working to develop a solution, teaching and mentoring for skill development, and follow-up.
All will be done as part of a team to round out skills and continuous internal knowledge-sharing.
You may also be interested in these presentations on the topic of “what we do”:
Training resources
These resources may be interesting to support your career as an RSE:
Skillset
Below, we have a large list of the types of technologies which are valued by our researchers and useful to our RSEs. No one person is expected to know everything, but we will hire a variety of people to cover many of the things you see here.
Most important is do you want to learn things from this list? Can you do so mostly independently but with the help of a great team?
More detailed list of relevant skills
This was an older long list of relevant skills. This is inspiration, not a list of things you must know. No one knows all of these when they start off.
General tech skills: Our broad background on which we build:
Basic mandatory skills include Linux, shell scripting, some low-level programming language (C, Fortran), and programming in several more languages (Python particularly advantageous).
Good knowledge of computer clusters, batch systems, and high-performance computing.
Any additional programming, workflow, research, or system tools are a plus. You should have a wide range of skills, but the exact skills are not so important. Most important is sufficient fluency to pick up anything quickly. These skills should be listed as an appendix to the cover letter if not included in the CV.
Advanced parallel programming skills are a plus, but equally important is the ability to create good, simple, practical tools.
Git, GitHub, git-based collaborative workflows.
Software testing, CI, documentation, reproducible, portability, etc.
As an example, the ideal candidate will have near-perfect knowledge of all Software Carpentry, CodeRefinery, and the generic parts of our HPC lessons - or be able to fill in gaps with minimal effort.
But at the same time, we don’t just want people from purely computational backgrounds. You’ll work with people from experimental sciences, digital humanities, etc, and good people from these backgrounds are important, too.
A good attitude towards mentoring and teaching and an ability to explain complex subjects in an accessible way.
Commitment to diversity and equality of researchers among many different backgrounds.
Good knowledge of English. Finnish is advantageous but not required, our internal working language is English.
Specific examples: This is a selection of advanced skills which are useful (remember, this is what you might learn, not what you already know):
Advanced experience of debugging/profiling/developing Linux tools, including Git, Intel and GNU compiler suits and corresponding tools.
Software building tools like Make, CMake and alike.
Advanced knowledge of parallel programming models, experience of parallel programming (OpenMP, MPI).
Advanced GPU computing / programming (CUDA, OpenACC, OpenMP models), experience of porting software to GPUs.
Profiling and optimization - both of low-level languages and high-level.
Knowledge of scientific software and packages including Matlab, Mathematica, Python libs, others is beneficial.
Experimental data collection, LabView, etc.
Workflow automation, shell scripting, porting from single machines to clusters.
Docker, Singularity, containers.
Data analysis tools like R, Python, pandas, numpy, etc. are beneficial.
Julia, Matlab, Mathematica.
Web development, cloud operations.
Scientific Computing on other operating systems.
Checklists
RSE project done
Discuss with the researchers
Explicitly confirm with customers that we are ending our focus on this project and won’t do more until we hear from them again.
Confirm it is publicly released, licensed, everything is done (or discuss what else might need to be done).
Make sure outputs are reported into ACRIS This is important because it makes our work visible.
Software: Add Content → Research output → Artistic and non-textual output → Software.
Data: https://www.aalto.fi/en/services/research-data-and-acris (Add Content → Dataset)
For each entry, under “Facilities/Equipment”, add “”Science IT””. This links it as an output of Aalto RSE.
Anyone can do this and add other relevant authors. The metadata entry can be made private or public, and the actual software/data is usually hosted elsewhere (and can be public or not).
Discuss what to do if there are issues in the future - garage, issue tracker, training courses.
Discuss what else may (or may not) need doing in the future.
Internal (RSE group) tasks
Issue tracker:
/summary
should contain a several sentence summary focused on the benefit to RSE service (this is used for final reports, etc).Confirm other metadata is correct
/contact
,/supervisor
contains people who may get emails about the project later (and shouldn’t contain people who may be surprised about automated survey emails). If these people should not get/timesaved
Outputs
/projects
,/publications
,/software
,/datasets
,/outputs
Get an interesting picture or screenshot for use in future material.
Not needed if there are overriding confidentiality considerations. The picture should never include personal data or data coming from a research subject (unless it’s already open).
Add to
triton:/scratch/scicomp/aaltoscicomp-marketing.git
(pictures/rse/
).Include a readme with citation, confirmation of what usage permissions there are, and a one-sentence general description suitable for presentations.
Examples; screenshot of website, screenshot of code that looks interesting, screenshot of repository page, picture of hardware device used, etc.
Add it to the next meeting agenda. We will collaboratively do an analysis to find lessons learned:
Facts about the project
Arrange facts into the big picture and timeline
Draw conclusions: what went well and did not go well? What were the causes of the good and bad things?
Lessons learned: what to do differently in the future.
Python project checklist
This checklist covers major considerations when creating a high-quality, maintainable, reusable Python codebase. It is designed to be used along with a RSE to guide you through it (it is in a draft stage, and doesn’t have link to what these mean). Not everything is expected for every project, but a sufficiently advanced complicated project will have most of these things.
Citeability and credit, authorship discussion
License
Version control
In use locally
In use on some platform (Github/Gitlab/etc)
Regular commits
Discuss issue tracker
Make one example pull request
Modular design
Standard project layout
Importable modules
Command line or other standard interface
(relates to packaging below)
-
Recommendation: pytest
Simple system tests on basic examples
More fine-grained integration or unit tests
CI setup
Test coverage
-
Forms / levels
README file: good enough?
Project webpage
Sphinx project
Read The Docs
To include
About
Installation
Tutorials
How to / simple examples to copy
Reference
Release
Module structure
pyproject.toml or setup.py
requirements.txt or environment.yml
PyPI release
conda-forge
Zenodo
Other pages on this site: Package your software well, The Zen of Scientific computing
Internal documents
We believe in openness, so make our procedures open. They are subject to improvement at any time. Also see the FCCI Tech seminar series for how our broader team works internally.
Message templates
These are templates for different messages we might send. As you might expect, they are probably not suitable for using directly (even by us), but it’s better to record them than lose them, and better to be open than not.
Announcements
Contacting researchers
None
Did you know of the Aalto Research Software Engineer service (https://scicomp.aalto.fi/rse/)? It provides specialized support in software development and computational science. Could any of your infrastructure users benefit from this service?
The point of this service is to make sure that anyone can succeed in their service, regardless of their computational background. For example, we can provide software development, advice and support for those programming themselves, data management support, help packaging and publish software, and so on. There are so many things that a person needs to know these days that one can’t expect to know everything.
We started in 2020 in the School of Science, and now have funding to support people from any school.
If you have any ideas, feel free to point your users to our service, https://scicomp.aalto.fi/rse/ . Or, we can arrange a discussion session to talk about ways to more closely work together, since I am sure there are ways that joining forces is best.
Project status (waiting)
None
We (Aalto RSE) still have an open issue in our queue about your project DESCRIPTION.
It’s still in our queue, and hopefully someone can get to it in WHEN. I’m wondering about status from your side - Is this still important to you? Have you figured out something else already, so that it’s not needed? Anything we should know about our scheduling and planning? Should we increase/reduce the priority? Would some smaller amount of help let you get going?
For short term stuff and consultations about the project, you can always try dropping by our garage, even before we actively start working: https://scicomp.aalto.fi/help/garage/
Follow-ups
None
/contact
/supervisor
Department/group:
Basic description:
- .
Current team:
- .
Each team does:
- .
Tech tools:
- .
Scientific tools/domain knowledge:
- .
Schedule
- Time estimate:
- Any deadlines?:
- Expected time, likelihood of going over:
- What happens if it goes over time? Backup plans?:
Links to existing docs:
- ...
/summary
/estimate
Feedback
None
Hi,
Some time ago, we helped you with ________________ as part of our Research Software Engineer service. Now that some time has passed, we would like to know if you had any feedback on our support. This is very important to us to ensure the continuation of this service, so please take a minute or two to quickly answer! A few numbers in reply to this message is sufficient.
First off, we wonder how much time (mental effort) do you think our work has saved you? (We know this can be hard to estimate, but any kind of rough prediction of “I avoided spending X days/hours to plan, implement, or debug what we would have done otherwise”.)
Then, what about these research outputs: how many have we contributed to?: Articles/papers, datasets, software projects released, projects supported in general, etc.
Do you have any other comments on our service?
Project administration
Note
This page is still a working document, discuss anything that appears like it should be improved.
Unfortunately (fortunately, since it means our work has value?), we need to track where our time goes in order to justify the benefits of what we do. There are two main uses of the data:
General reporting: being able to say how our time is distributed among departments and projects. This doesn’t have to be perfectly accurate (and since we have so many small projects, it would be a big waste of time to try to be perfect) - but it should be roughly proportional to actual time spent. This is tracked in Gitlab.
Financial reporting and project payments. This needs to be accurate, but only for the few projects which have special funding. The master data is in financial systems, but Gitlab can sometimes be used to make this reporting a bit easier.
Typical project flow
Someone will contact us somehow. We try to get them to the garage or a some talk as soon as possible.
Initial discussion. If it seems this should be a tracked project, then make the issue
Be aware that it takes some time to get up to speed with a project. This should be considered when making the initial estimate, during the first consultation. When recording time spent, include the time it takes to get up to speed and learn whatever else is needed for the project.
Finance time tracking
For projects with their own funding (external or internal funding), you should get instructions about how to record it. For many projects, this is marking them to Halli. All other projects (funded by the department’s/school’s basic funding) is marked in Halli to the standard RSE salaries project (ask for it).
Types of projects
Special projects
Examples: EU-funded projects
Special projects are their own distinct entity and are not mixed with other work of our team. They receive dedicated days for their work, and are not given attention on other days. Because these get exclusive days, the master data of these projects is in Halli, and because Halli can be used for records later, they are not recorded in Gitlab. (Note: “special” does not mean better, it’s usually more productive to be available for researchers whenever they need us).
Special projects get one Gitlab issue to track the overall contact, but it isn’t updated on a day-to-day basis.
Daily procedures: At the end of every day, record the working time in Halli. As much as possible, these project days should not be mixed with other work, but internal team meetings, etc. are allowed if necessary. In Halli, record each day’s worktime (scaled to the standard 7.25h/day) in proportion to the time spent on the special project (allocated to that project)/internal work (allocated to RSE-salaries).
Normal funded projects
For projects providing their own funding, but aren’t special, GitLab is used to track the time we spend on them. The main purpose of Gitlab is to record the department distribution of all of our basic funding, for which Halli can’t hold all the needed information. Other funded projects which can be intermixed with our normal work can fit into this category.
Daily procedures: A Gitlab issue is created for every
project and used for each day’s work, with funding source
Funding::Project
. Time is recorded in Gitlab and may be mixed
with other projects however the customer sees appropriate. Halli is
marked to the respective project and at least is correct by-month.
Internal charging projects
“Internal changing” projects are funded, but are paid in one sum for a certain amount of work, and there is no place to mark hours into Halli. These are mostly certain types of basic funding. Gitlab is used to track time spent on these projects.
Daily procedures: Like above for Gitlab. Halli is marked to the
standard RSE-salaries project. Funding::Project
Basic funding projects
These projects are paid by our basic funding, provided by our sponsoring units. This also includes all of our internal work, meetings, development, and teaching.
Daily procedures: Same as above. Gitlab funding marked as
Funding::Unit
Gitlab day-to-day procedure
See the rse-timetracking repository for info on how to use Gitlab. But the actual data is in rse-projects, a separate private repository.
Project prioritization
This page describes the types of projects we have and the general principles of how we prioritize them. This doesn’t exactly say how things are prioritized (there are too many)
Types of projects
Size
G
(for “garage”): the smallest projects, handled within the daily garage. A few hours and not scheduled, they are handled as people come to garage. Entered in the garage diary, but not the rse-projects tracker.Size
S
(“small”): <= 1-2 days.Size
M
(“medium”): <= 1 month.Size
L
(“large”): > 1 month. These are generally paid by the projects themselves.
RSE staff that are fully funded from a certain project out outside the system of this page, and work on the projects as decided by their funders.
Prioritization
G projects usually get priority, but that is because they are drop-in and not scheduled. Whoever is available will help (usually the same person but who knows). We help for a reasonable amount of time (depending on need and busyness) for each drop-in session. A session ends with the problem solved, a request to come back the next day, or an upgrade to an S-level project.
S projects are often used as fillers during downtime in other projects. We often have a general priority list, but the actual start time can be a bit uncertain.
M projects are sort of in the middle. They are scheduled when possible, but since they aren’t paid by the research groups the work might be more intermittent.
L projects, being paid by a particular group, usually get priority. However, often time there is downtime during these, which are used for other projects.
Some research groups provide “retainer” funding: long-term funding without a specific L-size project. Their funding is used for whatever S and M projects come up, and those S and M projects get a much higher priority (of course, depending on the urgency of the project itself).
There are two main steps in our prioritization:
General discussions during the weekly team meetings.
Each RSE’s evaluation of each project, based on their knowledge of the work, the time they have, and what the benefit will be.
Per-project prioritization factors:
Self-evaluation of usefulness and importance by the researchers
Benefit to open science and broader strategic impact
Long-term impact to research (for example, improved skills or use of tools)
Priority for units which provide funding
Diversity and balance, including diversity goals.
Implementation (2020 plan)
About this page
This is our tentative implementation plan, as of August 2020. It is always subject to revision, but is a somewhat controlled document.
About
Researcher Software Engineers provide specialized scientific computing and data management support to researchers, beyond what is currently offered by Science-IT. Their funding is guaranteed by departments/schools/other units, but after the ramp-up phase most funding is expected to come from the research projects themselves.
Services include, for example, software development, scaling up or optimizing computations, taking new technologies into use, and in general promoting best practices in new and existing research using computational methods.
Funding types and sources
Funding has three types:
Ramp-up/Guarantee (R/G): Ramp-up funding to do initial hires, until project funding takes over
Ramp-up: department/schools/other units allocate a certain amount of money to do hires.
Units which provide Ramp-up/guarantee get first priority for their projects.
Replaced with project funding (below), if there are no projects then used for basic services (below).
Project (P): External or group money, allocated by a PI for a specific task in their group.
Basic (B): Allocated from units for short-term basic service for all of its members.
Allows short, strategic assistance without microtransactions
Science-IT work is a type of basic work, but may be requested by the Science-IT team instead of the researchers themselves. (For example, Science-IT has a long list of inefficient hardware use and inefficient software practices which can keep RSEs occupied for a long time. RSEs can also work on Triton/scientific computing technical development projects, which helps RSEs gain competence for the rest of their tasks.)
Time allocation principles
We track time spent per unit. Fairshare algorithm: the unit with the largest “deficit” in time gets priority for upcoming projects.
Units which provide ramp-up/guarantee funding get priority for their projects.
Project funding replaces ramp-up/guarantee funding.
Time paid from basic funding is allocated to tasks within the unit with the greatest strategic benefit, for example helping an entire group to use better tools or fixing extreme waste of resources.
When a group provides project funding, they can decide the tasks the RSE will do.
Ramp-up plan
This is a rough estimate of the type of demand we expect:
Distribution of work |
|||||
---|---|---|---|---|---|
2020 H2 |
2021 |
2022 |
2023 |
Long-term |
|
FTE |
2 |
2–3 |
3–4 |
3–5 |
4+ |
Project work |
20% |
50% |
60% |
70% |
70% |
Basic work for units |
50% |
40% |
30% |
20% |
20% |
Basic work for Science-IT |
30% |
10% |
10% |
10% |
10% |
Our initial survey reached only Triton users and had 40 responses, of them 60% said “quick consultation”, 60% said “short term, 1–2 days”, 40% said “medium term, weeks to months”.
Actual ramp-up depends on funding cycles, research timing, and human psychology.
Start-up funding (already guaranteed)
(section removed; to be placed elsewhere)
Funding practicalities
Principle: the daily rate is roughly equal to “senior postdoc/staff scientist” salary + overheads.
Principle: When working for a research project, the RSEs record those working hours in Halli to that project. The corresponding portion of the salary is then automatically charged to the project. Remaining hours are recorded to the Dean’s unit RSE project, and once a year we split these costs and send them to each department. [Updated 2020-11-05]
(details to be filled in by Finance)
Measurement and KPIs
Number of projects and researchers who have been given support
Number of researcher-days saved, as estimated by our customers.
Fraction of project funding vs total program funding
Communication
Units which fund us will be informed of our activities at least every 6 months.
“As open as possible, as closed as necessary”. All RSE program data, documents, and statistics will be public, excluding actual project funding and information from the customers.
Risks and ramp-down
Primary risk: making permanent hires, yet not being able to sustain the program long-term.
Mitigation: we will only hire RSEs which can be absorbed into Science-IT naturally should the need of this service fade away.
Risk: difficulty in reaching researchers and explaining what we do
Mitigation: Science-IT has a long list of researchers who are using research services inefficiently: they can be contacted directly to inform about this service. Helping them and producing best practice examples for the future can keep several people busy for years.
Risk: Researchers see need, but group leaders unwilling to pay
This is indeed a risk, but there is precedence from other countries that there are enough people willing to pay. There will likely be a slow start, but as time goes on, expenses incurred by this service can directly be written into the budget of funding applications.
In our ramp-down strategy, we absorb the RSEs into Science-IT, CS-IT as part of its development efforts, or into other existing teams.
Job descriptions
Warning
This page is still in draft form and being discussed and developed. See the note on the parent page.
These are job descriptions for RSE descriptions. They are not yet formal HR job descriptions and won’t be directly used as such, but provide a vision of our career ladder.
A RSE is researcher whose advancement of science is not defined by number of papers, but by quality of software and contributions to open science.
RSE 1
A RSE1 is just starting their career and is being introduced both to software tools and the research process. This RSE would get mentoring much like a new doctoral student does, but instead of aiming to publications, they would aim to quality, released software.
Qualifications: Masters degree, thesis in combining computation and research or software development with some research qualifications, but little real-world research experience.
Pay/job level: roughly like master’s employee or PhD student. Advancement: would be expected within 1-2 years.
RSE 2
Able to competently work on own projects using tools they know while learning new tools effortlessly. They are currently learning to finding the right tool for the job and to connect the technical task (software and data related) to the impact to society, Aalto, and individual grants.
This is roughly equivalent to a postdoctoral researcher, a transition time between academic skills of a doctorate and whatever may come next. In particular, this can serve as a bridge between a (somewhat more theoretically focused) doctorate degree and a job in industry, and CV and skills development is in line with this.
Qualifications: Doctorate or extended work experience. Pay/job level: similar to postdoc.
Advancement: expected to advance within 2-3 years. This person is still in training (much like a postdoc) and is probably deciding which way to take their career.
RSE 3
Like above, but is additionally able to independently negotiate with research groups to plan a project, including deciding tools and expected results. In particular, a RSE 3 should be able to explain the value of good software practices to the researchers and plan/advocate for good open science and research data management practices across various fields.
Pay/job level: like staff scientist, always permanent.
Advancement: A person is a competent, independent scientist/engineer at this point, and advancing is not needed for everyone. Of course, lifelong learning always continues. To be honest, advancing in the academic system is difficult, and many people will make a horizontal move to another place.
Beyond
At this point, you are not exactly developing RSE skills but leadership skills. This is surely adjusted to each person individually, but two possible layers include:
RSE leader responsible for a department, school, or research area.
RSE group leader responsible for university-wide leadership.
Other internal/parallel advancement
Other career development is not a part of the Aalto RSE program (yet?), and to be honest it’s hard to see an internal advancement in the current academic system (by the time you get to the top of our team, you are already at the top). Still, there are many ways people can continue their career development depending on their career goals, for example:
Tech lead of larger RSE projects (few projects require this)
Study and develop new technologies for production (perhaps a parallel move to an IT team)
Management, either of RSE group or other services
Applying for grants, leading projects, etc. as a staff scientist might do (this would be outside the RSE service team)
Mentoring or supervising students or other researchers
At Aalto, these aspects are not yet developed, and some of them would be horizontal moves outside the RSE team (or collaboration with someone outside the team). At some point, people have to take their careers in the direction they want and begin combining various unique skills.
Commercial developers
We don’t plan on competing with commercial developers, but the difference with a RSE3 is that:
A software developer can do what is asked, but not work with the researcher to figure out what they actually need. The software developer will probably be slightly more requirements-product based, rather than agile-research work to develop a tool over time.
A software developer make produce a product that is not sustainable in an academic setting: requires too much focus and specialized knowledge to be improved in an academic environment.
A software developer may use more modern and industrial-scale tools.
A software developer from outside would come in and leave, a RSE in this group would provide longer-term support (but this is more a property of the group, not the person).
Unit information
This page describes the Aalto units which are supporting the RSE program and what their priorities are.
The service is currently (early 2023) mainly funded by the School of Science, with a grant from IT Services to allow use through all of Aalto.
See For units such as departments, schools, or flagship projects if you would like to join the RSE service as a department or school.
SCI
Supporting all community.
CS
Supporting all community.
NBE
Supporting all community.
PHYS
Supporting all community.
FCAI
FCAI sponsors several research software engineers, who both do general work and targeted work. In effect, FCAI projects get a higher priority and management sends some strategic projects to us for intense support.
Rest of Aalto
IT Services provides a grant to support research in all of Aalto.
If you project has its own funding, we can support it. And the Scicomp garage support is always available.
Advisory board
Warning
This page is a draft.
This page describes the advisory board of the Aalto RSE program and hosts the results of its meetings. Out of principle, all material is open on this page (though specific items may be retracted).
Purpose of the advisory board
The advisory board provides advice to the strategy (and when relevant, day to day implementation) of the Aalto RSE program and its relation to research, scientific computing, and teaching at Aalto.
Current advisory board
Currently, the advisory board is the Science-IT board.
Meetings (Section not in use)
Topics for the next meeting and results from previous meetings are located here, newest first.
Next meeting
Purpose of advisory board and its roles. How often to meet?
What are your priorities?
What is the threshold for your department to “pay” for service.
How can we find customers?
How much do we focus on cost recovery, and how much on basic work?
What are our KPIs? See Measurement and KPIs and Tracking scientific benefits.
Cost recovery from projects
N ongoing projects and N completed projects
N publications supported.
N open outputs produced (non-publication: datasets, software, etc.)
Survey (of PIs) of benefits after.
Estimated time saved.
2022 Aalto RSE report
Summary
The Research Software Engineering service allows researchers to take on more ambitious computational projects, and for existing projects to be much faster and higher quality.
About 100 projects in a bit less than two years.
Perhaps 5× return on time spent (time we spent vs time researchers saved).
There is no shortage of RSE projects, we could get more if we did more outreach (which we don’t focus on, since we would then be over-capacity).
We haven’t been able to receive much project funding, due to financial difficulties (grant rules make this difficult, we complete most projects so fast that the transaction would be too small).
We have gotten other long-term support: FCAI has supported a dedicated RSE, IT Services has provided funding to extend beyond SCI.
Our proposal is that groups receive ~1 month of free service, paid with department/school funding. After that, they should find their own funding.
Most projects take well under one month, though, so we still focus on basic funding.
However, there is a steady stream of longer-term project proposals which offer funding.
We ask for
Continued basic funding at the current levels.
Help in finding the researchers and projects who can most benefit by this service - can our results better be reported as Open Science/societal impact stories - should a small amount of RSE time be written into every grant?
Recommendations for other schools/departments to join us.
Current status of Aalto RSE
History of Aalto RSE
December 2018: Idea (“Computational support scholar” postdoc-type position)
December 2019: initial funding from SCI
October 2020: first hires (permanent)
Now: three permanent full time staff, continuous stream of projects.
Current staff and jobs
We are part of Science-IT:
Three full-time RSEs (the only ones funded by this service)
One staff leading the RSE group + working as a RSE
Two other staff with RSE funding from specific projects
Three other staff focused on infrastructure support, but “we are all RSEs anyway”
Types of projects
No shortage of projects, also not yet over our capacity.
We don’t advertise too much, since that would take us over our limit.
Thus, there is definitely capacity to expand.
Projects fit into two main categories:
RSE projects, take days to months, recorded in our issue tracker, includes long-term support.
The “classic work” of an RSE.
Garage help are small questions that come up in the “daily garage” and answered immediately.
Garage is our daily support method, answering small questions.
Garage transforms research from “trial and error” to “professional quality”.
Project stats
101 researcher projects in ~ 1.75 years.
Overall “researcher time saved” is generally 5× “RSE time spent” (self-reports from customers)

Time spent by department, 2020-2022. Not all time is recorded. Figure includes only full-time RSE time spent. Note that we have other funding sources that allow us to work outside of SCI, and that Science-IT receives general funding to serve the whole university.

Time estimate (past and present) by type of tasks of all recorded projects, including future projects, leads, and canceled ideas, 2020-2022. Projects have multiple tasks/benefits and all time is included in all tasks in this figure. Values should only be used as a relative comparison.
Garage stats
From October 2020 – August 2022, about 500 visitors logged (about half of visits are recorded).
Average visit from 30-60 minutes of support.
Most are answering tech/software questions to help research, and teach people to be self-sufficient.
We also estimate overall “researcher time saved” is 5× “RSE time spent” (self-reports from customers)

Garage customers by department, 2020-2022, small sample of data. Note that old garage data is extremely sparse, so this is more of a current estimate.

Garage customers by title, small sample of data. We only recently began collecting this data, so it is very incomplete but roughly representative. We only support researchers and staff, students not doing research are directed to student resources.
Current and future funding
Financial transaction difficulty
Original plan: try to get most funding from grants.
Finance (for very good reasons) doesn’t want to do small transactions - minimum 1 month. Thus, we haven’t been able to accept much project funding.
Academy/EU rules don’t allow easy internal invoicing, so must pay salary directly. This makes more overheads.
We need high-level leadership support on this topic.
Our current project funding policy
Each research group gets ~1 month of free RSE time sponsored by basic funding.
After that, a group is expected to provide their own funding for future RSE projects.
However, we finish most projects in less than a month.
Future funding plan
We should maintain at least ~2 FTE of basic funding for the near future for our current number of customers (≈ SCI).
Any increases would be used well, though.
Future hires could be made when project funding is enough to justify costs (SCI funding as buffer between project periods)
A fair number of projects (~10-20) have written months of work into submitted grants, funded us, or offered funding.
More basic funding from other departments?
IT Services has provided pilot funding (3 months) to expand to other schools, and has been a success.
Future plans
Planned long-term funding
The Finnish Center for AI has committed 4-5 years of full-time RSE funding, this was used to hire a third RSE.
We are currently (September) in planning to get more IT Services funding to secure the service beyond SCI. We will need to carefully check how this affects our staffing levels.
These type of strategic investments seem to practical and scaleable.
Wanted: Better outreach and impact
There is no shortage of projects, and advertising more will surely fill us up.
But, we can still increase the impact of the projects we select. Can you help point the most important projects to us?
Especially societal impact (public use of data and algorithms) could give us many more projects.
Expansion to other schools
We expect this service to expand to other schools and universities in the future (bringing their own funding).
This will allow a broader knowledge base from which any individual project can draw.
Please recommend to other leaders to join us in the RSE concept.
See also
The CSC optimization service is essentially a RSE service, targeted to CSC/LUMI resources (but in theory can do more). They are good at low-level programming kind of things.
The Nvidia AI Tech Center provides free RSE services for research projects for Finnish Center for AI members (includes Aalto).
Other links
Aalto Scientific Computing, the organization behind this program.
Nordic RSE community, currently in the process of being formed (Aalto SciComp and the RSE program is a member).
Keynote video by Mike Croucher on the rise of RSEs and their benefits
The UK RSE association is quite advanced in promoting RSE careers.
Why do we exist?
Note the bottom section on page 105(print)/106(PDF) of the 2018 Research, Art, and Impact assessment.
Point three of Vision for Nordic Open Science Data Collaboration, by the Nordic e-Infrastructure Collaboration 2022 program committee.
Scientific computing
In this section, you find general (not Aalto specific) scientific computing resources. We encourage other sites to use and contribute to this information.
Scientific computing tips
Encryption for researchers
This page describes the basics of encryption to an audience of researchers. It describes how it may be useful (nd when not needed) in a professional researcher environment, in order to secure data. It doesn’t describe encryption for personal use (there are plenty of other guides for that). It doesn’t go into very deep details about cryptography. It doesn’t get go into deep details. Cryptography is the type of things where there are a huge number of subtle details, and everyone has their own opinion. This guide is designed to provide an understanding for basic use, not cover every single subtle point.
Status: this is somewhat complete, but is not a complete guide. It will be extended as needed.
Summary
Modern cryptography is quite well developed, and available many places. However, the human side is very difficult. Encrypting something, but not keeping the key or password secure, has no benefits. To use your encryption, you need to decide what your goals are (who should access, who you want to keep safe from) and then plan accordingly. The security of cryptography is decided more by how you manage the keys and process than the deepest technical details.
Key management
The point of encryption is to trade a hard problem (keeping a lot of data secure) to a more limited problem (keeping a single key or password secure). These keys can be managed separately from the data. This immediately takes us to the importance of key management. Let’s say you can’t send data over email unless it is encrypted. If you encrypt it and send the password in the same email as the encrypted data, you have managed to technically satisfy the requirement while adding no real security at all. A better strategy would be to give the password to someone when you meet them in person, send it by another channel (e.g. SMS, but then it is only secure as SMS+email), or even better use asymmetric encryption (see below).
Deciding how you will manage keys is the hardest part of using encryption. For some fun, next time you hear someone talk about using encryption, see if they mention how they keep the keys secure. Usually, the don’t, and you have no way of knowing if they actually are doing it securely.
Symmetric vs asymmetric encryption
There are two main types of cryptography. They can both be considered equally secure, but have different ways of managing keys.
Symmetric encryption uses the same password/key for encrypting and decrypting. It is good because it is simple, because there is only one key or password you need to know and it is easy to think “one data=one password”. However, everyone needs to know the same password, and it can’t be changed. Since the same password has to be everywhere, this can be a bit insecure depending on the use, and you can argue it’s a bit complicated to keep that key password secure (if there are many people, or if it needs to be automated).
Asymmetric encryption has different keys for encrypting and decrypting. So, you use a “public key” to do an encryption (which requires no password - everyone in the world can know this key and your data is still secure). You have a separate private key (+password) which allows only you to decrypt it. This separation of encryption and decryption was a major mathematical breakthrough. Then, anyone who needed to receive data securely would have their own public/private key, and all the public keys are, well, public. When you want to send data to someone, you just encrypt it using their public key, and there is no need to manage sharing a password. This allows you to: encrypt so that multiple people can read it, encrypt automatically without password, and encrypt to someone not involved in the initial process.
With asymmetric encryption, there are some more things to consider. How do you make sure that you have the right public key?
Encryption programs
This lists some common programs, but this should not be taken to mean that using these programs makes your data safe. Security depends no how you use the program, and security will only decrease over time as new analysis is done. It is usually best to choose well-supported open source programs where possible. More detailed instructions will be provided as needed.
7zip
7zip is a file archiver (like zip). It can symmetrically encrypt files with a passphrase.
PGP
PGP is a set of encryption standards (and also a program). It has a full suite of encryption tools, and is quite stable and well-supported. You ofter hear about PGP in the context of email encryption, but it can be used for many things.
On Linux systems, it is normally found as the program gpg (Gnu Privacy Guard). This guide uses gpg.
Full disk encryption
Programs can encrypt the entire hard disk of your computer. This means that any data on it is safe, should your computer be lost. There are programs to do this for every operating system, and Aalto laptops now come encrypted by default.
Using symmetric encryption with gpg
Encryption:
gpg --symmetric input.file
Decryption:
gpg input.file.gpg
This will ask you for a password. If you do not want it to, you can use –passphrase-fd to pass it automatically. Normally, keeping a password in a file is considered quite insecure! Make sure that the permissions are restrictive. Anyone that can read this file once be able to read your data forever. The file could be backed up and spread all over the place - is that what you want? IT admins will be technically able to see the passphrase (though they do not). Is this all within the scope of your requirements?
cat pass.txt | gpg --passphrase-fd 0 --symmetric input.file
Using asymmetric encryption with gpg
When using asymmetric (public key) encryption, you have to generate two keys: public and private (they are made at the same time). The private key must be kept private, and has a passphrase on it too. This provides an added level of security on top of the file permissions.
There are plenty of guides on this available. Some examples:
You can encrypt a single files to multiple keys. This means that the owner of any of the private keys can decrypt the file. This can be useful for backups and disaster recovery.
General warnings
Strong encryption is serious business. It is designed so that no one can read the data should the keys or passwords be lost. If you mess this up and lose the key/password, your data is gone forever. You must have backups (and those backups must also be secure), …
If you keep passwords in files, or send them insecurely anyhow, then the technical security of your data is only as great as of that key/password.
The strength of your encryption also depends on the strength of your password (there is the reason it is often called a “passphrase” - a phrase is more secure than a standard password). Choose it carefully.
Advanced / to do
How much security is enough?
Set cipher to AES (pre 16.04)
Git
Git is a version control system. This page collects various Git tutorials and resources
Version control systems track changes to files. This means that as you are working on your projects (code, LaTex, notes, etc), you can track history. This means that you can see former history, and collaborate better. Using one for at least for code should probably be one of the minimum standards of computational research.
“Git is a distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.” Git
Note
This page is git in general, not Aalto specific.
aalto/git contains advice on the Aalto Gitlab, a repository for the Aalto community integrated to Aalto systems.
Basic git tutorials
There is an interactive git tutorial from codeschool and github. Good for your first use.
Software carpentry has a good tutorial focused on researchers.
Gitlab cheatsheet.
More references
You can search for many tutorials online.
software-carpentry.org (an organization that teaches development to scientists) has a very good tutorial online.
The book “Pro Git” is online.
Read chapters 1-3 for a good introduction to using git for your own projects.
Read chapter 5 for a good introduction to using git to collaborate with others.
There’s a somewhat official documentation place - including videos.
There is an official tutorial but it is probably too theoretical.
All git commands have very good but very detailed manual pages - type
man git COMMAND
orgit help COMMAND
to see them.Interactive git cheatsheet. (very good once you know the basics)
Gitlab-specific information:
Other hosting services
Realistically, use version.aalto.fi for most projects related to Aalto research, and Github if you want to make something open-source with a wider community (but you can also make open repos in Aalto Gitlab, just harder for random people to contribute). For non-work private repos, you have to make your own choice.
Github is a proprietary commercial service, but extremely popular. No free private repositories or groups (but you can pay).
Bitbucket is also somewhat popular, limit of free 5 private repositories (but you can pay for more).
Gitlab.com is a commercial service but makes the open-source Gitlab edition. Gitlab.com offers unlimited private repositories.
source.coderefinery.org is another Gitlab hosted by the Coderefinery project, a pan-Nordic academic group. It might be useful if you have a very distributed project, but realistically for Aalto projects, use Aalto gitlab.
Git-annex for data management
See also
Video intro to git-annex, from Research Software Hour.
DataLad is a researcher/research data management focused frontend to git-annex. This page is a relatively technical introduction to what goes on inside of git-annex, so the DataLad handbook might be a better place to start, and then consult this page for another view / more detailed information.
git-annex is a extension to git which allows you to manage large files with git, without checking their contents in git. This may seem contradictory, but it basically creates a key-value store for large files, whose metadata is stored in git and contents distributed using other management commands.
This page describes only a very limited set of features of git-annex and how to use them. In particular, it tries to break git-annex into three “simple” types of tasks. Git-annex can do almost anything related to data management, but that is also its weakness (it doesn’t “do one thing and do it simply”). By breaking the possibilities down, we can hopefully make it manageable. The three layers are:
Level 1: Track metadata in git and lock file contents local-only: Even on a single computer, one can rigorously track data files to record who produced the data, the history, and the hash of the content, even without recording the contents into git. On top of this, files can be very safely locked to prevent accidental modification of primary copies of the data. (commands such as
git annex add
)Level 2: Transfer and synchronize file content between repositories: Once the metadata is tracked and the git repository is shared, you might want to move the content between repositories. You can easily do this
git annex get
,git annex copy [--to|--from]
. You can put any file anywhere and metadata is always synced.Level 3: Manage synchronization across many repositories: Once you have more than two (or even more than one) repository, keeping track of locations of all files is hard. Git-annex solves this as well: you can define what content should be in each location and data is automatically distributed. So, for example, you can insist on all data is always stored in your object storage, all active data is also on the cluster, and user environments have whatever is requested. Git-annex is very focused on never losing data, it can ensure that one locked copy is always present in some repository. (commands such as
git annex wanted
,git annex numcopies
,git annex sync --content
)
The biggest problems are that it can do everything, which makes documentation quite dense, and the documentation can be hard to navigate.
Background
You probably know what git is - it tracks versions of files. The full
history of every file is kept. When something is recorded in
git-annex, the raw data is a separate storage area, and only links to
that and the metadata is distributed using regular git. So, all
clones know about all files, but don’t necessarily have all data.
Using git annex get
, one can get the raw data from another repo
and make it available locally.
For example, this is a ls -l
of a real git repository which has a
small-file.txt
and a large-file.dat
. You see that the small
file is just there, but the large file is a symlink to .git/annex/objects/XX/YY/...
:
$ ls -l
lrwxrwxrwx 1 darstr1 darstr1 200 Feb 4 11:08 large-file.dat -> .git/annex/objects/X4/xZ/SHA256E-s10485760--4c95ccee15c93531c1aa0527ad73bf1ed558f511306d848f34cb13017513ed34.dat/SHA256E-s10485760--4c95ccee15c93531c1aa0527ad73bf1ed558f511306d848f34cb13017513ed34.dat
-rw-rw-r-- 1 darstr1 darstr1 21 Feb 4 11:06 small-file.txt
If the repository has the file, the symlink target exists. If the
repository doesn’t have the file, it’s a dangling symlink. git
add
works like normal, git annex add
makes the symlink.
Now let’s git annex list
here. We see there are two repositories,
here
and demo2
. large-file.dat
is in both, as you can see
by the X
s. (“web” and “bittorrent” are advanced features, not
used unless you request… but give you the idea of what you can do):
here
|demo2
||web
|||bittorrent
||||
XX__ large-file.dat
The basic commands to distribute data are git annex get
, git
annex drop
, git annex sync
, and so on. The basic principles of
git-annex are data integrity and security: it will try very hard to
prevent you from using git/git-annex commands to lose the only copy of
any data.
Basic setup
After you have a git repository, you run git annex init
to set up
the git-annex metadata. This is run once in each repository in the
git-annex network:
$ git init
$ git annex init 'triton cluster' # give a name to the current repo
Level 1: locally locking and tracking data
You can add small files like normal using git (full content in git),
and large files with git annex add
, which replaces the file with a
symlink to its locked content:
$ git add small-file.txt
$ git annex add large-file.dat
$ git commit # metadata: commit message, author, etc.
Now, your content is safe: it is a symlink to somewhere in
.git/annex/objects
and it is almost impossible for you to
accidentally lose the data. If you do want to modify a file, first
run git annex unlock
, and then commit it again when done. The
original content is saved until you clean it up (unless you configure
otherwise). The largefiles settings will determine the behavior
of git add
, you can set which files should always be committed to
the annex (instead of git).
At this point, git push|pull
will only move metadata around (the
commit message and link to .git/objects/AA/BB/HHHHHHHH
, with the
hash HHHHH
a unique hash of the file contents). This is what is
stored in the primary git history itself.
Structured metadata (arbitrary key/value pairs) can be assigned to any
files with git annex metadata
(and can be automatically generated
when files are first added, such as the date of addition). Files can
be filtered and transferred based on this metadata. Structured
metadata helps us manage data much better once we get to level 3.
So now, with little work, we have a normal git repository that provides a history (metadata) to other data files, keeps them safe, and can be used like a normal repository.
Relevant commands:
git annex init: activate existing git repo for git-annex.
git annex add: add file to the annex, possibly depending on various rules
git annex unannex: opposite of
git annex add
git annex unlock: unlock an annexed file, so that it’s a normal file and can be edited.
git annex lock: opposite of
git annex lock
git annex metadata: show or set per-file metadata
git annex info: info on various things
Configuration
annex.largefiles
- rules for what should be automatically annexed
Level 2: moving data
Data in one place isn’t enough, so let’s do more. Just like git remotes, git-annex remotes allow moving data around in a decentralized manner.
Regular git remotes work, if the git-annex shell tools are installed.
Git-annex special remotes, which essentially serve as key-value stores. Options include S3, cloud drives, rsync, and many, many more.
Regular git remotes are set up with git annex init
on the remote
side. Special remotes are created with git annex initremote
.
Every remote has a unique name and UUID to manage data locations.
Once the remotes are set up, you can move data around:
$ git annex get data/input1.dat # get data from any available source
$ git annex copy --to=archive data/input2.dat
You can remove data from a repo, but git-annex will actively connect to other remotes to verify that other copies of the file exist before dropping it:
$ git annex drop data/scratch1.txt
These commands more around data in .git/annex/objects/
and update
tracking information on the special git-annex
branch so that
git-annex knows which remotes have which files - very important to
avoid a giant mess!
Special remotes can be created like such:
$ git annex initremote NAME type=S3 encryption=shared host=a3s.fi
And enabled in other git repositories to make more links within the repository network:
$ git annex enableremote NAME
Note that special remotes are client-side encrypted unless you set
encryption=none
, and also chunked to deal with huge files even on
remotes which do not support them.
Relevant commands:
git annex get: use available knowledge to get a copy of files from remotes.
git annex drop: delete a file from current repo. By default, make sure other copies exist before doing this.
git annex move: move file contents
git annex copy: copy file contents
git annex list: list of files including where contents are stored
git annex find: list files matching pattern
git annex initremote: initialize a special remote (info will be synced)
git annex enableremote: use synced info to prepare an existing special remote for use.
Level 3: synchronizing data
Moving data is great, but when data becomes Big, manually managing it
doesn’t work. Git-annex really shines here. The most basic command
is sync --content
, which will automatically commit anything new
(to git or the annex depending on the largefiles rules) and distribute
all data everywhere reachable (including regular git-tracked files).
Without --content
, it syncs only metadata and regular commits:
$ git annex sync --content
But, all data everywhere doesn’t scale to complex situations: we need to somehow define what goes where. And this should be done declaratively. One of the most basic declarations in the minimum number of copies allowed numcopies. Git-annex won’t let you drop a file from a repository without being very sure that this many copies exist in other repositories. This setting is synced through the entire repository network:
$ git annex numcopies N
The next level is preferred content, which
specifies what files a given repository wants. git annex sync
--content
will use these expressions to determine what to send
where:
$ git annex wanted . 'include=*.mp3 and (not largerthan=100mb) and exclude=old/*'
$ git annex wanted archive 'anything'
$ git annex wanted cluster 'present or copies=1'
Repository groups and standard groups
allow you to more easily define rules (the standard groups list lets
you see the power of these expressions). Various built-in background
processes can automatically watch for new files and run git annex
sync --content
automatically for you, which can make your data
management a fully automatic process. Repository transfer costs can
allow git-annex to fetch data from a nearby source, rather than a
further one. Client-side encryption can allow you to use any
available storage with confidence.
Relevant commands:
git annex sync [--content]: automatically commit/move data around based on the rules defined below
git annex numcopies: set default number of copies for every annexed file (minimum redundancy level)
git annex trust: mark a repo as being trusted (it won’t lose data so you don’t have to verify contents before deleting locally)
git annex untrust: opposite of
git annex trust
git annex wanted: set files which will be automatically synced to a repo.
git annex group: set a repo as part of a group
git annex groupwanted: same as
git annex wanted
but for groupsgit annex required: similar to
git annex wanted
but prevents you from dropping the content unless you force itgit annex unused: find older versions of files which are no longer referred to in the current version and can be dropped
git annex schedule: manage background processes that
git annex sync
git annex watch: monitor current repo for changes and
git annex sync
when they happen
See also
DataLad is a data-management focused interface for git-annex. This might be a better place to start. DataLad also handles submodules (useful for very large numbers of files) and running workflows and saving the metadata.
git LFS These two git extensions are often compared. git LFS is created by GitHub, and operates on a centralized model: there is one server, all data goes there. This introduces a single point of failure, requires a special server capable of holding all data, and loses distributed features. git-annex is a true distributed system, and thus better for large scale data management.
dvc: The level 1/2 use case is practically copied from git-annex. It seems to have a lot less flexibility on high-level data management, client-side encryption. The main point of dvc seems to be track commands that have been run and their inputs/output to make those commands reproducible, which is completely different from git-annex. Most importantly (to the author of this page) it has default-on analytics sent to remote servers, which makes its ethics questionable.
Hybrid events
This page is our recommendations/ideas for hybrid events (in-person plus online components). It may be out of place on scicomp.aalto.fi, but it’s the best place we have to put them right now. Unlike other recommendations, this page is not just for teaching but applies to any type of event.
Why hybrid?
Why do you want hybrid, as opposed to online or in-person? If you can’t clarify the purpose to yourself, it may be hard to put on a successful event.
In-person gives better chances to talk in small groups and among your friends, both during and after the event. (Is your in-person event disadvantaging introverts or less well connected people?)
Online allows anyone to participate with a lower threshold. If you do it right, you could allow anyone in the world to take part.
As a side note, for massive events, participants can get a full experience by having their own group chat to discuss the topics, separate from the event chats.
General considerations
Plan and test early, don’t assume things work unless you experience it yourself.
The first time (or few times), have a separate “director” who can manage the online part and tech, so the hosts focus on hosting.
Related to the above (possibly the same person), have someone to help interface with the audience and relay questions from them to you, answer basic questions, etc. This person should be able to interrupt you immediately for pressing questions. For the largest events, have two: one person answering questions directly, one selecting and queuing questions for the speakers.
Audio is the most important part and will most often go wrong. Make sure you use microphones well, don’t count on wide-area room mics, do an audio check days before and immediately before, ask audience if it is good, and make sure they tell you immediately if problems develop.early if things get worse.
Consider activities for during breaks for the people online. Yes, you need to be slow to give people a chance to go get their coffee, but also can you do something during breaks. Are there some ways to facilitate online↔in-person networking during breaks?
The meeting begins well before the scheduled time for random discussion, and ends well after scheduled time for post-meeting discussion. Don’t end the online discussion right after the meeting (this is an important lesson even for online meetings!).
For the reasons above, you need more staff than a single-faceted event. For each of registration, entertaining people during breaks, etc, you will need someone to do the same thing for the online people, and usually it would be better if you have someone focusing on each audience (at the same time working together to bring the them together).
What about after the event. If you have streamed it, you could also record it. Can you do this while maintaining privacy of all participants, so that this information is not lost and reuseable later? What follow-up communication and so on can you do? Start thinking of this early.
Feedback and interaction
One of the biggest advantage of online events is the combination of multiple communication channels, so that it is not just extroverts asking questions.
Have a clear way to get feedback (like presomo). Make it very explicit how this works. Have some icebreaker polls/questions.
Require in-person audience to ask questions via the feedback tool, not via voice. Distributing microphones is a lot of work and will often be forgotten, and also voice questions bias towards extroverts, and you will be able to better order your answers. Text questions also allow other people to answer and give help at the same time. If a question becomes a discussion, you could distribute microphones.
When feedback and questions are done well, they can be published along with the talk (make sure you announce this in advance). Especially the “document-based” method below is very good for this, since it can be fixed up after the course.
Make sure that the current presenter can always see the questions. A good recommendation is a separate computer with it large font next to your presentation computer.
To encourage people to use this, it is best to also screenshare/project it, so that the audience can see that it is in active use. This takes some screen space, but can be well worth it if it increases interaction.
If the text communication tool is the same as the rest of what the event uses, and has good treading support, then you get even more synergies.
There are different types of feedback tools:
Chat is simple, but linear and thus questions can easily get lost, and answers are hard to connect to questions. The advantage is it is usually built-in to meeting software.
feedback tools like Presemo (https://presemo.aalto.fi) allows basic questions, voting, and replies.
Documents (google docs, HackMD, etc) allow free-form text. The general idea is people write a free-form question or comment at the bottom of the document, and bullet points are used to give answers or replies. This requires some getting used to and has risk of trolling in extremely large events, but when this works, it works well. See the CodeRefinery HackMD mechanics for an example and advice.
Tech: Zoom
Zoom, and other meeting software, have many of the features that can be used for an easy, self-service hybrid event. We assume you know how to use Zoom (or equivalent) by yourself for an online meeting, and here we describe the changes for hybrid events.
The advantage of using normal meeting software is that you don’t need to learn a new tool and it is perfectly reasonable to do everything self-service.
Classrooms set up for hybrid work have camera inputs hooked up to the room cameras. There is a separate control panel for switching and rotating the cameras. Play around with the controls to learn how they work. Select the right input.
Zoom can equally share the screen like normal.
If you present from your own computer, you can run zoom on your computer to share screen, and use the room computer to share the camera view + sound. You can tell any other presenters to do the same.
Consider how you screenshare if it should be a two-way meeting (online audience should be visible to local audience):
Zoom in “Dual monitor mode” (find under general settings) actually produces two windows, one with the {current speaker or screenshare} and one with the gallery. If you have two monitors in the room, this makes a great experience: the entire gallery is visible and if someone uses zoom “raise hand”, it is apparent to everyone.
If you do the above, the current speaker can present from their desk via screenshare. This may be easier than transferring to the presentation computer.
Remember to share the collaborative notes, agenda, and/or chat by default, so that people are motivated to use that instead of speaking over each other.
Remember the benefits of being online. Providing slides and material in advance allows online (and in-person) people to use multiple channels at the same time, if it suits them.
Zoom audio in a classroom
As described above, audio is one of the most important considerations. In principle it is easy, but there are many details to consider.
The first is your goals: we have three categories, (presenter), (in-person audience), (online audience). Which of them should hear each other?
The main thing is to prevent audio feedback. To solve this, it is important to have one machine as the audio master in the room (it has both the microphone and speakers connected to it). This also prevents the presenter from having their audio go back into the room via the online meeting.
Presenter → online can be done with microphones connected to a computer, for example the classroom computer connected to the microphones or a bluetooth microphone.
In-person audience → online, in practice, needs to be done by passing around microphones. An wide-area microphone might work, or might not.
Online → in-person is a bit more interesting. You can connect the audio computer to the speakers in the room (or external speakers). You will need to position the speakers to avoid feedback into the microphones as much as possible, and adjust all the different volumes.
To adjust for different sound levels of the different groups, you might need someone continually monitor and go adjusting the volumes of the various microphones separately.
Overall, you could say that voice communications is the main point of in-person meetings. But it is also the hardest to scale to a large audience. Consider if you can get text feedback and interaction working well, and then perhaps you could skip audio - and perhaps the entire effort of a hybrid event?
Tech: dedicated A/V setup
We have put on an event with a dedicated A/V setup, with external microphones, etc. In the end, it also used Zoom to broadcast to the world, so was quite similar to the above. Perhaps this recommendation is obsolete and one should just use the above as a starting point?
TODO: more info
Tech: live streaming
For a largest events, meeting software doesn’t work: you have to manage all the participants, and any one participant can disrupt the event for everyone else. The “live streaming” model is much better in this case: it is a one-to-many broadcast, not many-to-many meeting. Live streaming is popular these days, and thus you can find many user-friendly but powerful tools.
For now, see CodeRefinery manuals on the MOOC strategy for a detailed description.
See also
Aalto University links:
Rooms with lecture capture built-in (or filter by “Lecture capture” in booking system): https://wiki.aalto.fi/display/OPIT/Lecture+capture+spaces
Hybrid teaching recommendations (not really focused on technology, but how to engage): https://wiki.aalto.fi/display/OPIT/Hybrid+teaching+in+Aalto+University
Another lecture Zoom-capture idea (Uses a smartphone and a bluetooth microphone, simple but may miss some communication channels. This could be combined with the above.): https://wiki.aalto.fi/display/OPIT/Zoom#expand-Case1Onlineandinpersonlecturesimultaneously
Pitfalls of Jupyter Notebooks
Jupyter Notebooks are a great tool for research, data science type things, and teaching. But they are not perfect - they support exploration, but not other parts of the coding phase such as modularity and scaling. This page lists some common limitations and pitfalls and what you can do to avoid them.
Do use notebooks if you like, but do keep in mind their limitations, how to avoid them, and you can get the best of both worlds.
None of the limitations on this page are specific to notebooks - in fact we’ve seen most of them in scripts long before notebooks were popular.
Modularity
We all agree that code modularity is important - but Jupyter encourages you to put most code directly into cells so that you can best use interactive tools. But to make code the most modular, you want lots of functions, classes, etc. Put another way, the most modular code has nothing except function/class/variable/import definitions touching the left margin - but in Jupyter, almost everything touches the left margin.
Solutions:
Slowly work towards functions/classes/etc where appropriate, but realize it’s not as easy to inspect their insides as non-function code.
Be aware of the transition to modules - do it when you need to. See the next point.
Try to plan so it’s not too painful to make the conversion when the time comes.
Transitioning to modules
You may start coding in notebooks, but once your project gets larger, you will need to start using your code more places. Do you copy and paste? At this point, you will want to split your core code into regular Python modules, import them into your notebooks, and use the notebooks as an interface to them - so that modules are somewhat standard working code and notebooks are the exploration and interactive layer. But when does that happen? It is difficult to make that transition unless you really try hard, because it’s easier to just keep on going.
Solutions:
Remember that you will probably need to form a proper module eventually. Plan for it and do it quickly once you need to.
Make sure you notebooks aren’t disconnected from your own Python code in modules/packages.
You can set modules to automatically reload with
%load_ext autoreload
,%autoreload 1
, and then%aimport module_name
. Then your edits to the Python source code are immediately used without restarting and your work is not slowed down much. See more at the IPython docs on autoreload (note: this is Python kernel specific).importnb to import notebooks as modules - but maybe if you get to this, you need to rethink your goal.
Difficulty to test
For the same reasons modularity outlined above, it’s hard to test notebooks using the traditional unit testing means (if you can’t import notebooks into other modules, you can’t do much). Testing is important to ensure the accuracy of code.
Solution: Include mini-tests / assertions liberally. Split to modules when it is necessary - maybe you only create a proper testing system once you transition to modules.
Solutions:
Various extensions to pytest that work with notebooks
nbval, pytest-notebook: run notebook, check actual outputs match outputs in ipynb.
pytest-ipynb: cells are unit tests
This list isn’t complete or a recommendation
But just like with modularity above, a notebook designed to be easily testable isn’t designed for interactive work.
Transition to modules instead of testing in the notebook.
Version control
Notebooks can’t be version controlled well, since they are JSON format. Of course, they can be version controlled (and should be), and there are a variety of good solutions so this shouldn’t stop you.
Solutions:
Don’t let this stop you. Do version control your notebooks (and don’t forget to commit often!), even if you don’t use any of the other strategies.
nbdime - diffing and merging, VCS integration
Jupyter lab / notebook git integration work well.
Notebooks in other plain-text formats: Rmarkdown, Jupytext (pair notebooks with plain text versions).
Remember, blobs in version control is still better than nothing.
Notebooks aren’t named by default
This is really small, but notebooks aren’t named by default. If you don’t name them well, you will end up with a big mess. Also somewhat related, notebooks tend to purpose drift: they start for one thing then end up with a lot of random stuff in them. How do you find what you need? Obviously this isn’t specific to notebooks, but the interactive nature and modularity-second makes the problem more visible.
Solutions:
Remember to name notebooks wells, immediately after making them.
Keep mind of when they start to feature drift too much, or have too many unrelated things in them. Take some time to sort your code logically once that happens.
Difficult to integrate into other execution systems
A notebook is designed for interactive use - you can run them from the command line with various commands. But there’s no good command line interface to pass arguments, input and output, and so on. So you write one notebook, but can’t easily turn it into a flexible script to be used many times.
Solutions:
Modularize your code and notebooks. Use notebooks to explore, scripts to run in bulk.
Create command line interfaces to your libraries, use that instead of notebooks.
There are many different tools to parameterize and execute notebooks, if you think you can keep stuff organized:
… and plenty more
Jupyter disconnected from other computing
This is also a philosophical one: some Jupyter systems are designed to insulate the user from the complexities of the operating system. When someone needs to go beyond Jupyter to other forms of computing (such as ssh on cluster), are they prepared?
Solutions:
This is more of a mindset than anything else.
System designers should not go through extra efforts to hide the underlying operating system, nor separate the Jupyter systems from other systems.
Include non-Jupyter training, some intro to the shell, etc. in the Jupyter user training.
Summary
The notebooks can be great for starting projects and interactive exploration. However, as a project gets more advanced, you will eventually find that the linear nature of notebooks is a limitation because code can not really be reused. It is possible to define functions/classes within the notebook, but you lose the power of inspection (they are just seen as single blocks) and can’t share code across notebooks (and copy and paste is bad). This doesn’t mean to not use notebooks: but do keep this in mind, and once your methods are mature enough (you are using the same code in multiple places), try to move the core functions and classes out into a separate library, and import this into the day-to-day exploration notebooks. For more about problems with notebooks and how to avoid them, see this fun talk “I don’t like notebooks” by Joel Grus. These problems are not specific to notebooks, and will make your science better.
In a cluster environment, notebooks are inefficient for big calculations because you must reserve your resources in advance, but most of the time the notebooks are not using all their resources. Instead, use notebooks for exploration and light calculation. When you need to scale up and run on the cluster, separate the calculation from the exploration. Best is to create actual programs (start, run, end, non-interactive) and submit those to the queue. Use notebooks to explore and process the output. A general rule of thumb is “if you would be upset that your notebook restarted, it’s time to split out the calculation”.
Notebooks are hard to version control, so you should look at the Jupyter diff and merge tools. Just because notebooks is interactive doesn’t mean version control is any less important! The “split core functions into a library” is also related: that library should be in version control at least.
Don’t open the same notebook more than once at the same time - you will get conflicts.
References
This funny talk “I don’t like notebooks” by Joel Grus provided a starting point of this list.
nbscript: run notebooks as scripts
Warning
This page and nbscript are under active development.
Notebooks as scripts?
Jupyter is good for interactive work and exploration, but eventually you need more resources than an interactive session can provide. nbscript is a tool (written by us) that lets you run Jupyter notebooks just like you would Python files. (nbscript main site)
See also
Other tools: There are other tools that run notebooks non-interactively, but (in my opinion) they treat command-line execution as an afterthought. There is a long-standing standard for running scripts on UNIX-like systems, and if you don’t use that, you are staying locked in to Jupyter stuff: the two worlds should be connected seamlessly. Links to more tools here.
Once you start running notebooks as scripts, you really need to think about how modular your whole workflow is. Mainly, think about dividing your work into separate preprocessing (“easy”), analysis (“takes lots of time and memory”), and visualization/post processing (“easy”) stages. Only the analysis phase needs to be run non-interactively at first (to take advantage of more resources or parallelize), but other parts can still be done interactively through Jupyter. You also need to design the analysis part so that it can run on a small amount of data for development and debugging, and the whole data for the actual processing. You can read more general advice at Jupyter notebook pitfalls.
Concrete examples include:
Run your notebook efficiently on a separate machine with GPUs.
Run your code in parallel with many more processors
Run your code as a Slurm batch job or array job, specifying exactly the resources you need.
nbscript basics
The idea is nbscript input.ipynb
has exactly the same kind of
interface you expect from bash input.sh
or python input.py
:
command line arguments (including input files), printing to standard
output. Since notebooks don’t normally have any of these concepts and
you probably still want to run the notebook through the Jupyter
interface, there is a delicate balance.
Basic usage from command line. To access these command line arguments, see the next section:
$ nbscript input.ipynb [argument1] [argument2]
If you want to save the output automatically, and not have it printed to standard output:
$ nbscript --save input.ipynb # saves to input.out.ipynb
$ nbscript --save --timestamp input.ipynb # saves to input.out.TIMESTAMP.ipynb
If you want to submit to a cluster using Slurm, you can do that with
snotebook
. These all run automatically with --save
--timestamp
to save the output:
$ snotebook --mem=5G --time=1-12:00 input.ipynb
Setting up your notebook
You need to carefully design your notebook if you want it to be usable both as a script and as through Jupyter. This section gives some common patterns you may want to use.
Detect if your notebook is running via nbscript, or not:
import nbscript
if nbscript.argv is not None:
# We *are* running through nbscript
Get the command line arguments through nbscript. This is None
if
you are not running through nbscript:
import nbscript
nbscript.argv
You can use argparse like normal to parse arguments when
non-interactive (take argv
from above):
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('input', help='Input file')
args = parser.parse_args(args=argv)
Save some variables or save file if not running through nbscript:
if nbscript.argv is not None:
import cPickle as pickle
state = dict(results=some_array,
other_results=other_array,
)
pickle.dump(state, open('variables.pickle'), pickle.HIGHEST_PROTOCOL)
Don’t run the main analysis when interactive:
if nbscript.argv is None:
# Don't do this stuff in Jupyter interface
Running with Slurm
Running as a script is great, but you need to submit to your cluster.
nbscript
comes with the command snotebook
to make it easy to
submit to Slurm clusters. It’s designed to work just like sbatch
,
but directly submit notebook files without needing a wrapper script.
snotebook
is just like nbscript
, but submits to slurm (via
sbatch
) using any Slurm options:
$ snotebook --mem=5G --time=1-12:00 input.ipynb
$ snotebook --mem=5G --time=1-12:00 input.ipynb argument1.csv
By default, this automatically saves to input.out.TIMESTAMP.ipynb
,
but can be configured.
You can put normal #SBATCH
comments in the notebook file, just
like you would when submitting with sbatch
. But, it will only
detect it from the very first cell that has any of these arguments,
so don’t split them over multiple cells. Example:
#SBATCH --mem=5G
#SBATCH --time=1-12:00
Just like with sbatch, you can combine command line options and in-notebook options.
See also
nbscript main page, with more information.
New group leaders: what to know about computational research
As a new group leader, how can you make the most of your future group’s computational work, so that it becomes an investment rather than a liability? This is currently focused on software.
If you are actively writing research software yourself, perhaps directly check out The Zen of Scientific computing instead of this for the more practical side.
About you
Are you planning a research group which partly uses computing
Is your computing not your main thing (not what you want to focus on/not what you studied)?
Do you want your new hires to use best practices, even if you can’t mentor them yourself?
Do you want your research to be reproducible and open?
Why plan in advance?
Your group’s work is valuable.
Over time, your work’s value can grow…
… or it can be lost every 5 years as your group changes.
What usually goes wrong?
At a group level, these often happen to semi-computational groups:
Every researcher starts a project over from scratch
Researchers leave, previous work becomes unusable (your group completely changes every ~5 years!)
If you don’t work at it, your group’s software and data gets more and more disorganized, until it becomes unusable. It limits what you can do in the future.
At an individual level:
Time wasted with bugs
Time wasted when one can’t repeat analysis for reviews
Desire to hide or not share code because it’s “messy”, which promotes the above cycle continuing. And less Open Science.
Step 1: Define how you work together
This is kind of meta, but: do you want to be a group of people connected by supervisor, or a team that works together?
Is co-working limited to coffee chats and presentations at group meetings?
Do these presentations comment only on the final results?
Or do you discuss and praise good practices for getting those results?
Are some meetings spent on skill development?
Or on the other end, are you co-developing the same project?
Are you a team, or a bunch of independent contractors?
Suggestions
Don’t be only results oriented in your group activities. Make sure you value the process with both your time and mental energy.
Planning vs writing a plan
Plans are useless, planning is indispensable - Dwight Eisenhower
Different grants request you make a data management plan and I’ve seen ideas of software management plan for the future.
If you making a plan just for a grant, I think that’s the wrong idea. You want everything you do to go beyond single projects.
Suggestions
Make a “practical plan” for important aspects, in your group’s documentation area: “here is where you find our data”, “here is where we share code”, etc. Keep it lightweight but useful.
Designate it as part of onboarding.
Update it as needed.
Group documentation, “group wiki”
A single place for reference on groups practices helps with onboarding and keeping things consistent and usable over time.
A group wiki is a good place to start.
Minimum documentation about how you want things done - or how they are actually being done.
But not so strict that you can’t make progress in the future.
Index of important software, data, and other resources
But description of the software/data should be with the them, not in the group docs.
Can you make everything open. e.g. your group website contains this reference information, so it also serves as an advertisement?
Suggestions
If in doubt, make a group wiki
Use it to keep your group’s internal operating information organized - however makes sense for you.
When you hear of someone doing something new, ask: “did you update this in our wiki?”
Skill development
Many people learn basic programming. Far fewer people learn best practices beyond programming:
This, especially version control, is covered very well in the CodeRefinery workshop, twice a year.
Consider attending a CodeRefinery as a team
If you use lots of computing: Aalto Introduction to Scientific Computing and HPC
Train early, before getting started with bad practices that can’t be changed.
But there is also informal learning, mentoring:
You learn more from co-working than courses.
You need good, active mentoring (not weekly status checks, but real co-working)
Desks next to each other where you can see each others screens
Pair programming
But, as an academic supervisor, you probably don’t have time to mentor. How do you get mentoring?
Set up group to work together
Time and motivation for self-learning
Encourage a internal specialist who can mentor for you (“Research software engineer”).
Suggestions
Everyone in your group attends a CodeRefinery workshop
At least one group member is developed into a computational specialist and supports others.
Why talk so much about teaching and mentoring, rather than practices?
Unlike many topics, we can’t rely on academic courses to prepare your group members.
In all my experience, good software and data practices comes from sharing good internal practices.
I know supervisors can’t do everything, but hopefully they can promote what they need internally.
Software in research
Software allows you to do far more than one can alone and transform research.
… but can also be one of the most complex tasks you do.
What kind do you use?
You can and will use software developed by others
Many groups develop their own internally.
If you make something good, you may want to release it so that others can use it - and cite you.
Software: tools
We give a lightning overview. Come to CodeRefinery for the full story.
Version control
Tracks changes
solves: Everything just broke but I don’t know what I changed.
solves: I’m getting different results than when we submitted the paper.
Allows collaboration
solves: “can you send me the latest version of the code”
solves: “we’re using two different versions, too bad”
Creates a single source of truth for the code
Not different scattered around on everyone’s computers
Most common these days: git
Suggestions
Everyone must learn the basics of a version control system (CodeRefinery week 1 does this).
Find a source of advanced support (your specialist group member or some other university service)
Github, Gitlab, etc.
Version control platforms
Online hosting platforms for git (others available)
Very useful to keep stuff organized
Makes a lot of stuff below possible.
Individual projects and organizations with members - for your group.
Suggestions
Make one public Github/Gitlab organization for your group
Make one internal Gitlab organization hosted at your university.
Strongly discourage personal repositories for common code.
Issue tracking
Version control platforms provide issue trackers
Important bugs, improvements, etc. can be closely tracked.
Suggestions
Use issues for your most important common projects
Change proposals (aka “pull requests”)
Feature of version control platforms like Github or Gitlab
People should work together, but maybe not everyone should be able to modify everything, right?
Contributors (your group or outside) can contribute without risk of messing things up.
For this to work you need to actually review, improve, and accept them
Suggestions
Decide which projects are important enough for a more formal change process.
Use pull requests for these projects which should not be broken.
Testing
How do you know your code is correct? Try running it, right?
But what happens if you change it later?
Software testing is a concept of writing tests, which can automatically verify functionality.
You write tests, and then anytime you make a change later, the tests verify it still works.
Suggestions
Each moderately important project has some test data and can automatically run something
More important projects: add in as many tests as practical
Documentation
Documentation makes reusability.
Minimum is Readme files in each repository.
Big projects can have dedicated documentation.
Suggestions
Every projects gets a README file. As supervisor, read these README files and confirm what it contains.
Dedicated, in-repository documentation for large projects (for example Sphinx)
Licensing
Reuse gets you citations
Reuse requires a license - or else significant reuse will be minimal.
You will often need to check your local policies on making something open source.
Suggestions
Decide (with stakeholders) on a license as early as possible - use only open-source licenses unless there is special reason. You don’t have to actually open right away.
Try to focus on using similarly licensed things.
Publication and release
If you invest in your software, you probably want to share it
“If we release a paper on some method, and we don’t include easy to use software to run it, our impact will be tiny compared to what it could be.” - CS Professor
Good starting point: make the repository open on Github/Gitlab
Can also be archived on Zenodo (or other places) to make it citeable.
Do all work expecting that it might be made open someday. Separate public and secret information into different repositories.
Suggestions
Public on GitHub/GitLab as soon as possible
Next level is releases on package indexes
You can make software papers later (when relevant)
Working together on code
Group discussion: What can go wrong when people work together?
Other computational topics
… not exactly software, but still relevant to this discussion.
Data storage
Discourage single-user storage spaces (laptop, home directories)
Use common shared spaces instead
Network drives
Usually used via a remote system
Some can be locally mounted on your own laptop for ease of use
Not the best for people who want to work on their own computer, but works. Data can be synced.
Aalto Scientific Computing strategy:
All mass storage provided in shared group directories.
Request as many as your want - each one has a unique access control.
Access and data can be passed on as the group evolves.
Suggestions
Have a plan. People know where central storage is and at least one copy must be there.
Request central network drive storage if possible.
Ask your group members: “Where is your data? Is the data documented?”
Data storage locations at Aalto University
Own devices
Danger, no backups! Personal devices are considered insecure.
Aalto home directories
Aalto network drives
Large, secure, backed-up. Request from your department or from Aalto IT Services.
10-100 GB range is easy.
Triton HPC Cluster
Very large, fast, direct cluster access, but not backed up.
10s-100s of TB.
CSC data storage resources
Public data repositories
For open data
Computing
There are a range of computing options: (easy to use, small) ⋄ (harder to use, large)
Own devices
Remote servers
Remote computer clusters
Aalto
CSC
Support
It’s dangerous to go alone. Take us!
There were many things above.
Hopefully you got some ideas, but I don’t think that anyone can do this alone (I learned everything by working with others)
Rely on support and mentoring.
Some possibilities, if you are at Aalto:
At Aalto: Daily Scientific Computing garage
At Aalto: Data Agents
Suggestions
Ensure your group members come to garage if they have questions you can’t answer.
Come to a RSE consultation and chat at least once when getting your group started.
Summary: dos and don’ts
You are not allowed to
Not use version control
Not push to online repository
Have critical data or material only on an own computer.
Make something so chaotic that you can’t organize it later
Go alone
… but you don’t have to
Start every code perfectly
Do everything perfectly
… as long as you can improve it later, if needed.
Know everything yourself.
Checklist
Set up group reference information (for example, wiki).
Work with your supporters to create a basic outline of plan.
Set up Github organization for group code
Set up Gitlab organization for internal work (university Gitlab)
Create your internal data/software management plan.
(Think what code/data will be most reused, put it in one place, and make it reusable.)
Send group members to CodeRefinery as they join.
See also
The Zen of Scientific computing - different levels of different aspects you can slowly improve. Emphasizes that you don’t have to be perfect when you first start.
Package your software well
This page gives hints on packaging your research software well, so that it can be installed by others.
As HPC cluster administrators, a lot of time is spent trying to install very difficult software. Many users want to use a tool released by someone, but it turns out to not be easy to install. Don’t let that happen to your code - keep the following things in mind, even at the beginning of your work. Do you want your code to be reused, so that you can be cited?
This page is specifically about packaging and distribution, and doesn’t repeat standard programming practices for scientists.
Watch a humorous, somewhat related talk “How to make package managers cry”.
Application or library
Application: Runs alone, does not need to be combined with other software. Note that if your application is expected to be installed in a environment that is shared with other software, it is more like a library. Note that this is how most scientific software is installed!
Library: Runs embedded and connected with other software that is not under your control. You can’t expect everything else to use the exact versions of software that you need.
The dependency related topics below mostly apply to libraries - but as the note says, in practice they affect many applications, too.
Use the proper tools
Each language has some way(s) to distribute its code “properly”. Learn them and use them. Don’t invent your own way of doing things.
Use the simplest, most boring, reliable, and mainstream system there is (that suits your needs).
Minimize dependencies
Build off of what others make, don’t re-invent everything yourself. But at the same time, see if you can avoid random unnecessary dependencies, especially ones that are not packaged well and well-maintained. It will make your life and others worse.
Don’t pin dependencies
Don’t pin exact versions of dependencies in a released library. Imagine if you want to install several different libraries that pin slightly different versions of their dependencies. They can’t be installed together, and the dependency solver may take a long time trying before it gives up.
But you do often want to pin dependencies for your environments, for example, the exact collection of software you are using to make your paper. This keeps your results reproducible, but is a different concept that releasing your software package.
You don’t pin dependencies strictly when someone may indirectly use your software in combination with arbitrary other packages. You should have some particular reason for each pin your have, not just “something may break in the future”. If the chances of something breaking in the future are really that high, you should wonder if you should recommend others to use this until that can be taken care of (for example, build on a more stable base).
You’ll notice that a lot of these topics deal with dependencies. Dependency hell is a real thing, and you should carefully think about them.
Be flexible on dependencies
Following up from above, be as flexible as dependencies as possible. Don’t expect the newest just because it’s the newest.
If you have to be strict on dependencies because the other software is changing behavior all the time, perhaps it’s not a good choice to build on. Maybe there’s no other choice, but that also means that you need to realize that your package isn’t as reusable as you might hope.
Try to be robust in dependencies
Follow the robustness principle to the extent possible: “Be conservative in what you do, be liberal in what you accept from others”. Try not to be as resistant as possible to dependencies changing, while providing a stable interface for other things. Of course, this is hard, and you need a useful balance. For “resistance to dependencies changing”, I interpret this as being careful what interfaces I use, and see if I can avoid using things I consider likely to change in the future.
Of course, robustness applies to other aspects, too.
Have tests
Have at least some basic automated tests to ensure that your code works in conjunction with all the dependencies. Perhaps also have a minimal example in the README file that someone can use to verify that they installed properly (could be the same as the tests). The tests don’t have to be fancy, even something that runs the code in a full expected use case will let you detect major problems early. This way, when someone is installing the software for someone else, they can check if they did it correctly.
Don’t expect the latest OS
Don’t design only for the latest and greatest operating system: then, many people who can’t upgrade right away won’t be able to use it easily. Or, they’ll have to go through extra effort to install newer runtimes on their older operating system.
For example, I usually try to make my software compatible with the latest stable operating systems from one year ago, and latest Python packages from two years ago. This has really reduced my stress in moving my code around, even if it does mean I have to wait to wait to use some new features.
Test on different dependency versions/OSs/etc
This starts to get a little bit harder, but it’s good to test with diverse operating systems or versions of your key dependencies. This probably isn’t worth it in the very early phases, but it is easier once you start using continuous integration / automated testing. Look into these once you get advanced enough.
Most clusters have different and older operating systems that you’d use on your desktop computer.
A container does not replace good packaging
“I only support using the Docker container” does not replace good packaging as described above. At the very least, it assumes that everyone can use Docker/singularity/container system of the year on the systems they need to run on. Second, what happens if they need to combine with other software?
A container is a good way to make compute easier and move it around, but make good packaging first, and use that packaging to install in the container.
Other
There is plenty more you should do, but it’s not specific to the topic of this page. For example,
Have versions and releases
Use a package repository suitable to your language and tool.
Have good documentation
Have a changelog
etc…
See also
Python
Note For triton specific instructions see triton python page. For Aalto Linux workstation specific stuff, see Aalto python page.
Python is widely used high level programming language that is widely used in many branches of science.
Python distributions
Python to use |
How to install own packages |
|
---|---|---|
Simple programs with common packages, not switching between Pythons often |
Anaconda 2/3 |
|
Most of the use cases, but sometimes different versions of modules needed |
Anaconda 2/3 |
conda environment + conda |
Special advanced cases. |
Python from module system |
virtualenv + pip install |
There are two main versions of python: 2 and 3. There are also different distributions: The “regular” CPython that is usually provided with the operating system, Anaconda (a package containing cpython + a lot of other scientific software all bundled together), PyPy (a just-in-time compiler, which can be much faster for some use cases).
For general scientific/data science use, we suggest that you use Anaconda. It comes with the most common scientific software included, and is reasonably optimized.
PyPy is still mainly for advanced use (it can be faster under certain cases, but does not work everywhere). It is available in a module.
Installing your own packages with “pip install” won’t work unless you have administrator access, since it tries to install globally for all users. Instead, you have these options:
pip install --user
: install a package in your home directory (~/.local/lib/pythonN.N/
). This is quick and effective, but if you start using multiple versions of Python, you will start having problems and the only recommendation will be to delete all modules and reinstall.Virtual environments: these are self-contained python environment with all of its own modules, separate from any other. Thus, you can install any combination of modules you want, and this is most recommended.
Anaconda: use conda, see below
Normal Python: virtualenv + pip install, see below
Installing own packages: Virtualenv, conda, and pip
You often need to install your own packages. Python has its own package manager system that can do this for you. There are three important related concepts:
pip: the Python package installer. Installs Python packages globally, in a user’s directory (
--user
), or anywhere. Installs from the Python Package Index.virtualenv: Creates a directory that has all self-contained packages that is manageable by the user themself. When the virtualenv is activated, all the operating-system global packages are no longer used. Instead, you install only the packages you want. This is important if you need to install specific versions of software, and also provides isolation from the rest of the system (so that you work can be uninterrupted). It also allows different projects to have different versions of things installed. virtualenv isn’t magic, it could almost be seen as just manipulating PYTHONPATH, PATH, and the like. Docs: https://docs.python-guide.org/dev/virtualenvs/
conda: Sort of a combination of package manager and virtual environment. However, it only installed packages into environments, and is not limited to Python packages. It can also install other libraries (c, fortran, etc) into the environment. This is extremely useful for scientific computing, and the reason it was created. Docs for envs: https://conda.io/projects/conda/en/latest/user-guide/concepts/environments.html.
So, to install packages, there is pip and conda. To make virtual environments, there is venv and conda.
Advanced users can see this rosetta stone for reference.
Anaconda
Anaconda is a Python distribution by Continuum Analytics. It is nothing fancy, they just take a lot of useful scientific packages and put them all together, make sure they work, and do some sort of optimization. They also include all of the libraries needed. It is also all open source, and is packaged nicely so that it can easily be installed on any major OS. Thus, for basic use, it is a good base to start with. virtualenv does not work with Anaconda, use conda instead.
Conda environments
See also
Watch a Research Software Hour episode on conda for an introduction + demo.
A conda environment lets you install all your own packages. For instructions how to create, activate and deactivate conda environments see http://conda.pydata.org/docs/using/envs.html .
A few notes about conda environments:
Once you use a conda environment, everything goes into it. Don’t mix versions with, for example, local packages in your home dir. Eventually you’ll get dependency problems.
Often the same goes for other python based modules. We have setup many modules that do use anaconda as a backend. So, if you know what you are doing this might work.
The commands below will fail:
conda create -n foo pip
# tries to use the global dir, use the--user
flag insteadconda create --prefix $WRKDIR/foo --clone root
# will fail as our anaconda module has additional packages (e.g. via pip) installed.
Basic pip usage
pip install by itself won’t work, because it tries to install globally. Instead, use this:
pip install --user
Warning! If you do this, then the module will be shared among all
your projects. It is quite likely that eventually, you will get some
incompatibilities between the Python you are using and the modules
installed. In that case, you are on your own (simple recommendation is
to remove all modules from ~/.local/lib/pythonN.N
and reinstall). If
you get incompatible module errors, our first recommendation will be to
remove everything installed this way and not do it anymore.
Python: virtualenv
Virtualenv is default-Python way of making environments, but does not work with Anaconda.
# Create environment
virtualenv DIR
# activate it (in each shell that uses it)
source DIR/bin/activate
# install more things (e.g. ipython, etc.)
pip install PACKAGE_NAME
# deactivate the virtualenv
deactivate
Linux shell crash course
Note
This is a kickstart for the Linux shell, to teach the minimum amount needed for any scientific computing course. For more, see the linux shell course or the references below.
This is basic B-level: no prerequisites.
Watch this in video format
There is a companion video on YouTube, if you would also like that format (and a slightly longer one with more detail).
If you are reading this case, you probably need to do some sort of scientific computing involving the Linux shell, or command line interface. You may wonder why we are still using a command line today, but the answer is somewhat simple: once you are doing scientific computing, you eventually need to script and automate something. The shell is the only method that gives you the power to do anything you may want.
These days, you don’t need to know as much about the shell as you used to, but you do need to know a few important commands because the command line works when nothing else does - and you can’t do scripting without it.
What’s a shell?
It’s the old-fashioned looking thing where you type commands with a keyboard and get output to the screen. It seems boring, but the real power is that you can script (program) commands to run automatically - which is the point of scientific computing.
You type a command, which may include arguments. Output gets
shown to the screen. Spaces separate commands and arguments.
Example: cp -i file1.txt file2.txt
. cp
is the command, -i
is
an option, and file1.txt
and file2.txt
are arguments. The
meaning of each option and argument is completely determined by the
program itself.
There are some conventions for options. For example, --help
or
-h
usually prints some help.
Files are represented by filenames, like file.txt
.
Directories are separated by /
, for example mydir/file.txt
is file.txt inside of mydir.
Exercise: Start a shell. On Linux or Mac, the “terminal” application does this.
Editing and viewing files
nano
is an editor which allows you to edit files directly
from the shell. This is a simple console editor which always gets the
job done. Use Control-x (control and x at the same time), then
y
when requested and enter, to save and exit.
less
is a pager (file viewer) which lets you view files
without editing them. (q
to quit, /
to search, n
/ N
to research forward and backwards, <
for beginning of file, >
for end of file)
Listing and moving files
ls
lists the current directory. ls -l
shows more
information, and ls -a
shows hidden files. The options can be
combined, ls -la
or ls -l -a
. This pattern of options is
standard for most commands.
mv
will move or rename files. For example, mv file.old
file.new
.
cp
will make a copy of a file, with the exact same syntax as
mv
: cp file.old file.copy
.
rm
will remove a file: rm file.txt
. To remove a directory,
use rm -r
. Note that rm
does not have backups and does not
ask for confirmation!
mkdir
makes a directory: mkdir dirname
.
Current directory
Unlike with a graphical file browser, there is a concept of current
working directory: each shell is in a current directory. If you
ls
, it lists files in your current directory. If a program tries
to open a file, it opens it relative to that directory.
cd dirname
will change working directories for your current
shell. Normally, you will cd
to a working directory, and use
relative paths from there. /
alone refers to the root
directory, the parent of all files and directories.
cd ..
will change to the parent directory (dir containing this
dir). By the same token, ../..
the parent of the parent, and so
on.
Exercise: Change to some directory and then another. What do
(cd -
) and (cd
with no arguments) do? Try each a few times in
a row.
Online manuals for any command
man
is an on-line manual, type man ls
to get help on the
ls
command. The same works for almost any program. In general you look for what you need,
not read everything. The program that views the manual pages is (by
default) less
which was described above: Use q
to quit or
/
to search (n
and N
to search again forward and
backwards).
--help
or -h
is a standard argument that prints a short
help directly: for example cp --help
.
Manual pages can be long, some are easy to read, some are impossible. tldr.sh is a project that collects simplified usage examples, see the tldr.sh interactive web viewer.
Exercise: briefly look at the manual pages and --help
output
for the commands we have learned thus far. How can you make rm
ask before removing a file?
History and tab completion
Annoyed at typing so much? We’ve got two ways to make work faster.
First, each shell keeps its (shell) history. By pushing the up arrow key, you can access previous lines. Never type similar things twice, go up in history and find the previous line, modify it, then push enter to re-run.
Shells also have tab completion. Type the first few letters of any command or filename and push tab once or twice… it will either complete it or show you the options. This is so important that it’s used often, and many command arguments can also be completed.
Exercise: Play around with tab completion. Type pytho
and
push TAB
. (erase that then start over) Then type p
and push
TAB
twice. (erase that and start over) Then ls
, space, and
the first few letters of a filename, then push TAB
.
Variables
There are two kinds of variables in shell: environment variables
and shell variables. You don’t need to worry about the difference
now. The $NAME
or ${NAME}
syntax is used to is used to access
the value of a variable.
For example, the environment variable HOME
holds your home
directory, for me /home/rkdarst
. The command echo
prints
whatever its arguments are, so echo $HOME
prints my home
directory. (Note that the variable is a property of the shell, not
of the echo command - this is sometimes important).
To set a variable, use NAME=value
. export NAME=value
sets it
as an environment variable which means that other processes you
start (from this shell) can use it.
The $VARIABLE
syntax is also often used for examples: in this
case, it isn’t an environment variable, but just something you need to
substitute yourself when running a command.
Quick reference
Cheatsheet
- General notes
The command line has many small programs that when connected, allow you to do many things. Only a little bit of this is shown here.
Programs are generally silent if everything worked, and only print an error if something goes wrong.
ls [DIR]
List current directory (or DIR if given).
pwd
Print current directory.
cd DIR
Change directory.
..
is parent directory,/
is root,/
is also chaining directories, e.g.dir1/dir2
or../../
nano FILE
Edit a file (there are many other editors, but
nano
is common, nice, and simple).mkdir DIR-NAME
Make a new directory.
cat FILE
Print entire contents of file to standard output (the terminal).
less FILE
Less is a “pager”, and lets you scroll through a file (up/down/pageup/pagedown).
q
to quit,/
to search.mv SOURCE DEST
Move (=rename) a file.
mv SOURCE1 SOURCE2 DEST-DIRECTORY/
copies multiple files to a directory.cp SOURCE DEST
Copy a file. The
DEST-DIRECTORY/
syntax ofmv
works as well.rm FILE ...
Remove a file. Note, from the command line there is no recovery, so always pause and check before running this command! The
-i
option will make it confirm before removing each file. Add-r
to remove whole directories recursively.head [FILE]
Print the first 10 (or N lines with
-n N
) of a file. Can take input from standard input instead ofFILE
.tail
is similar but the end of the file.tail [FILE]
See above.
grep PATTERN [FILE]
Print lines matching a pattern in a file, suitable as a primitive find feature, or quickly searching for output. Can also use standard input instead of
FILE
.du [-ash] [DIR]
Print disk usage of a directory. Default is KiB, rounded up to block sizes (1 or 4 KiB),
-h
means “human readable” (MB, GB, etc),-s
means “only of DIR, not all subdirectories also”.-a
means “all files, not only directories”. A common pattern isdu -h DIR | sort -h
to print all directories and their sizes, sorted by size.stat
Show detailed information on a file’s properties.
find [DIR]
find can do almost anything, but that means it’s really hard to use it well. Let’s be practical: with only a directory argument, it prints all files and directories recursively, which might be useful itself. Many of us do
find DIR | grep NAME
to grep for the name we want (even though this isn’t the “right way”, there are find options which do this same thing more efficiently).|
(pipe):COMMAND1 | COMMAND2
The output of
COMMAND1
is sent to the input ofCOMMAND2
. Useful for combining simple commands together into complex operations - a core part of the unix philosophy.>
(output redirection):COMMAND > FILE
Write standard output of
COMMAND
toFILE
. Any existing content is lost.>>
(appending output redirection):COMMAND >> FILE
Like above, but doesn’t lose content: it appends.
<
(input redirection):COMMAND < FILE
Opposite of
>
, input toCOMMAND
comes fromFILE
.type COMMAND
orwhich COMMAND
Show exactly what will be run, for a given command (e.g.
type python3
).man COMMAND-NAME
Browse on-line help for a command.
q
will exit,/
will search (it usesless
as its pager by default).-h
and--help
Common command line options to print help on a command. But, it has to be implemented by each command.
See also
The linux shell course has much more detail.
Software Carpentry has a basic shell course. Sections one to 3 are details of what is above (the rest is about shell scripting).
Explore manual pages
For some fun, look at the manual pages for cat
,
head
, tail
, grep
.
Linux shell course (advanced)
Read the Linux shell course and understand what “pipes” and piping” are.
SSH
Secure Shell (SSH) is the standard program for connecting to remote servers and transferring data. It is very secure and well-supported, so it’s worth learning to use it properly. This page both gives a bit of a crash course (top) and more details (bottom) for all common connection methods.
Setup
Check the tabs below for your operating system and methods to see which method you want to use.
PowerShell is built in to Windows 10 and includes OpenSSH (the same as on Linux). Start the “Windows PowerShell” program. Then, follow the “Command line” instructions on most of this page if there isn’t a separate PowerShell tab. If you want to set up SSH keys there are a few differences but overall it is the same procedure.
This should work by default on recent Windows 10.
The Windows Subsystem for Linux lets you install a Linux operating system inside of Windows. This is what we recommend with Windows, if it works.
Install the Windows Subsystem for Linux and then use the “Command line” instructions. This will give you a top-level interface to scientific work on your computer.
This may not work if you do not have proper admin rights on your computer (e.g. if it is university managed). Ask your IT support.
This should only be used if the other methods don’t work.
PuTTY is a separate application that includes a terminal and SSH together. This used to be recommended before Windows 10. There aren’t detailed instructions below, but most of the ideas can be done with PuTTY somehow (except that SSH keys take more work).
MobaXterm is a separate application that allow SSH and also graphical applications. It’s liked by some people, but is freeware/commercial so isn’t discussed much more here. TODO: someone could describe it more if they wanted.
SSH is built-in to almost any distribution. If it’s not there,
try installing the openssh-client
package.
Start the Terminal application to follow the rest of the instructions. Then, follow the “Command Line” instructions on most of this page.
SSH should be built-in. Start the Terminal application. Then, follow the “Command Line” instructions on most of this page.
This guide uses Aalto University’s HPC cluster as an example, but should be applicable to other remote servers at Aalto as well and many other outsiders as well.
Basic use: connect to a server
The standard login command with the command line is:
$ ssh USER@triton.aalto.fi
where USER
is your username (Aalto: standard Aalto login, not
email address) and triton.aalto.fi
is the address of the server
you with to connect - replace this for your situation.
First time login: check host key
When connecting to a new computer, you will be prompted to affirm that you wish to connect to this server for the first time. This lets you make sure you are connecting to the right computer (which is important if you type a password!). You’ll get a message such as:
The authenticity of host 'triton.aalto.fi (130.233.229.116)' can't be established.
ECDSA key fingerprint is SHA256:04Wt813WFsYjZ7KiAyo3u6RiGBelq1R19oJd2GXIAho.
Are you sure you want to continue connecting (yes/no)?
If possible, compare the key fingerprint you get to the one for the
machine which you can find online (Triton cluster:
Triton ssh key fingerprints, Aalto servers),
and if they do not match, please contact the server administrator
immediately. If they do match, type yes
and press enter. You
will receive a notice:
Warning: Permanently added 'triton.aalto.fi,130.233.229.116' (ECDSA) to the list of known hosts.
The public key that identifies Triton will be stored in the file
~/.ssh/known_hosts
and you shouldn’t get this prompt again. You
will be also asked to input your Aalto password before you are fully
logged in. You want to say “yes, save the key for the future” - it’s
more secure and you can always change it later if needed.
Checking known servers
You will not receive an authenticity prompt upon first login if the
server’s public key can be found in a list of known hosts. To check
whether a server, for example kosh.aalto.fi
, is known:
$ ssh-keygen -F kosh.aalto.fi
Your computer might come with some keys pre-loaded for your university’s computers, for example:
$ ssh-keygen -f /etc/ssh/ssh_known_hosts -F kosh.aalto.fi
SSH keys: better than just passwords
By default, you will need to type your password each time you wish to ssh into Triton, which can be tiresome, particularly if you regularly have multiple sessions open simultaneously. A more secure (and faster) way to authenticate yourself is to use a SSH key pair (this is public-key cryptography. The private key should be encrypted with a strong password xkcd has good and amusing recommendations on the subject of passwords. This authentication method will allow you to log into multiple ssh sessions while only needing to enter your password once, saving you time and keystrokes.
Generate an SSH key
While there are many options for the key generation program ssh-keygen
, here are the four main ones.
-t
-> the cryptosystem used to make the unique key-pair and encrypt it.-f
-> filename of key-C
-> comment on what the key is for
Here are our recommended input options for key generation:
$ ssh-keygen -t ed25519
This works on Linux, MacOS, Windows
The PuTTYgen program can generate keys. We don’t go into more details right now. This provides a graphical application to generate keys and from here you would extract the OpenSSH format keys to copy to the servers.
Accept the default name of the key file by pushing enter with no extra
text(it will be automatically used later). Then, you will be prompted
to enter a password. PLEASE use a strong unique password. Upon
confirming the password, you will be presented with the key
fingerprint as both a SHA256 hex string as well as randomart
image. Your new key pair should be found in the hidden ~/.ssh
directory (A directory called .ssh
in your user’s home directory).
Key type ed25519
makes a private key named ~/.ssh/id_ed25519
and public key named ~/.ssh/id_ed25519.pub
. The private key only
stays on your computer. The public key goes to other comuters.
Other key types were common in the past, and you may need to change
your filenames in some of the future commands (for exmaple
~/.ssh/id_rsa.pub
).
Copy public key to server
In order to use your key-pair to login to a server (for example: the
Triton cluster), you first need to securely copy the desired public
key to the machine with ssh-copy-id
. The script will also add the
key to the ~/.ssh/authorized_keys
file on the server. You will be
prompted to enter your Aalto password to initiate the secure copy of
the file to Triton.
$ ssh-copy-id -i ~/.ssh/id_ed25519.pub USER@triton.aalto.fi
With this, we have to also make the directory and make sure the file has the right permissions.
$ ssh USER@triton.aalto.fi "mkdir -p ~/.ssh ; chmod go-rwx ~/.ssh"
$ cat ~/.ssh/id_ed25519.pub | ssh USER@triton.aalto.fi "cat >> ~/.ssh/authorized_keys"
$ ssh USER@triton.aalto.fi "chmod go-rwx ~/.ssh/authorized_keys"
Connect to the system via some method and get a shell. Copy the
OpenSSH public key (it should be one line, though a quite long
line). You’ll want to past the key as a new line in the file
~/.ssh/authorized_keys
file on the other server. This is a
file in your home directory (~
), in the .ssh
directory.
From a terminal on the remote computer, you can run these
commands to make a .ssh
directory, edit the file, and set the
permissions correctly. nano
is a common editor, if it’s not
available you need to use a different one:
$ mkdir -p ~/.ssh
$ nano ~/.ssh/authorized_keys
## Paste the key into that file and save.
$ chmod -R go-rwx ~/.ssh/
You can also edit .ssh/authorized_keys
to manage your keys
later.
You’ll need to grab the key from PUTTYgen and copy it to the remote server. Copy the key from PuTTYgen and then us ethe “Manual” instructions.
Connecting from outside of the Aalto network
Sometimes, you can’t connect directly to the computer you need to, since there is a jump host as some sort of a firewall. You need to connect to that computer first. This is described below in the section ProxyJump, but we give first workaround here. but roughly.
All this is easier if you set up a config file with ProxyJump
(-J
) first, and copy keys one at a time. (see as
described below). Once this is done, you
can copy your key to kosh
first, then triton_via_kosh
for
example.
Aalto University: If you can connect by VPN, or to Eduroam, then you can directly access the Triton cluster and copy your key like above.
First copy the key to the jump host (like kosh.aalto.fi
), then
copy to your final destination (like triton.aalto.fi
):
$ ssh-copy-id -i ~/.ssh/id_ed25519.pub USER@kosh.aalto.fi
$ ssh-copy-id -i ~/.ssh/id_ed25519.pub -o ProxyJump=USER@kosh.aalto.fi USER@triton.aalto.fi
Like before, since ssh-copy-id
isn’t available, we have
to do extra steps to make sure the key is has the right
permissions - twice! You may need to enter your password
many times here.
## Copy stuff to our jump host
$ ssh USER@kosh.aalto.fi "mkdir -p ~/.ssh ; chmod go-rwx ~/.ssh"
$ cat ~/.ssh/id_ed25519.pub | ssh USER@kosh.aalto.fi "cat >> ~/.ssh/authorized_keys"
$ ssh USER@kosh.aalto.fi "chmod go-rwx ~/.ssh/authorized_keys"
## Copy stuff to the real destination
$ ssh -J USER@kosh.aalto.fi USER@triton.aalto.fi "mkdir -p ~/.ssh ; chmod go-rwx ~/.ssh"
$ cat ~/.ssh/id_ed25519.pub | ssh -J USER@kosh.aalto.fi USER@triton.aalto.fi "cat >> .ssh/authorized_keys"
$ ssh -J USER@kosh.aalto.fi USER@triton.aalto.fi "chmod go-rwx ~/.ssh/authorized_keys"
Login with SSH key
If the key is in one of the standard filenames, it should work directly.
SSH key agent
To avoid having to type the decryption password, the private key
needs to be added to the ssh-agent
with the command
You will need administrative permissions to be able to start a ssh-agent on your machine that can store and handle passwords.
Open Services from the start menu
Scroll down to OpenSSH Authentication Agent > double click
Change the Startup type to Automatic (Delayed Start), or anything that is not Disabled, then Apply, and also start the service manually if it is not yet running.
ssh-add
to add the default key (to add a certain key, usessh-add ~/.ssh/id_ed25519
, for example)
The program Pagent (“PuTTY Agent”) can unlock your keys once and give them to PuTTY each time they are needed. You can add keys and manage it from the small icon in the system tray. TODO: more instructions on using Pagent.
SSH is likely to automatically save the key the first time you use it, so that you don’t have enter your key’s password multiple times. If not, this will probably add it:
$ ssh-add
(You’ll get a message if ssh-agent
is not running. In this
case, to start a new agent, use eval $(ssh-agent)
. It’ll
only work for this one shell, check the rest of the Internet for
how to do more.) TODO: is any more needed?
$ ssh-add --apple-use-keychain ~/.ssh/id_ed25519
Once the password is added, you can ssh as normal but will immediately be connected without any further prompts for passwords.
ProxyJump
Often, you can’t connect directly to your target computer: you need to
go through some other firewall host. This is often done with two
separate ssh
commands, but can be done with only one with the
-J
(ProxyJump) option:
$ ssh -J FIREWALL.aalto.fi triton.aalto.fi
Both of these can take more options, for example if you need to specify your username you might need to do it twice:
$ ssh -J USER@FIREWALL.aalto.fi USER@triton.aalto.fi
Read more details at https://www.redhat.com/sysadmin/ssh-proxy-bastion-proxyjump, including putting this in your configuration file (or see below).
(Windows with PuTTY: Connection > Proxy > Proxy type=”SSH to proxy and use port forward.”, then enter the firewall host as “Proxy hostname” and port 22.
Multiplexing
Connections can be even faster: you can re-use existing connections to
start new connections, so that future ssh
commands to the same
host are almost instant. It multiplexes across the same
connection, and is controlled by ControlMaster
, ControlPath
,
and ControlPersist
. With a proper SSH key setup, the gain is
minimal, but it can be useful sometimes. It is not recommend to use
this unless you really want this, since there are some gotchas::
Connections hanging (e.g. unstable network, changing network) will cause all multiplexed connections to hang.
All multiplexed connections need to stop before the master process (first SSH connection) will stop. So if you try to exit the first SSH but child processes are using it, it will appear to hang - this may not be obvious.
If you are using with ProxyJump, there are two possible SSH processes which can hang and cause things to go wrong.
Only use this on your own computers that you control, for security reasons.
This works with OpenSSH. If you want to use this, to you ssh config
file (see below) add ControlMaster auto
and ControlPath
/tmp/.ssh-USER-mux-ssh-%r@%h:%p
(replacing USER with your username)
and test well. You might want ServerAliveInterval 30
to kill
stuff soon if network goes down. We don’t give a full example to
prevent unintended problems. If you notice weird things happening
with your ssh, point your helpers to this section.
Config file: don’t type so many options
Remembering the full settings list for the server you are working on
each time you log in can be tedious. A ssh config
file allows you
to store your preferred settings and map them to much simpler login
commands. To create a new user-restricted config
file
$ touch ~/.ssh/config && chmod 600 ~/.ssh/config
$ New-Item ~/.ssh/config
Open the created file to edit it as indicated below.
For a new configuration, you need specify in config
at minimum the
Host: the name of the settings list
User: your login name when connecting to the server (if different from the username on your computer)
Hostname: the address of the server
So for the simple Triton example, it would be:
# Configuration file for simplifying SSH logins
#
# HPC slurm cluster
Host triton
User LOGIN_NAME
Hostname triton.aalto.fi
and you can use only this command to log in from now on:
$ ssh triton
Any additional server configs can follow the first one and must start
with declaring the configuration Host
:
# general login server
Host kosh
User LOGIN_NAME
Hostname kosh.aalto.fi
# light-computing server
Host brute
User LOGIN_NAME
Hostname brute.aalto.fi
There are optional ssh settings that may be useful for your work, such as:
# Turn on X11 forwarding for Xterm graphics access
ForwardX11 yes
# Connect through another server (eg Kosh) if not connected directly to Aalto network
ProxyJump USER@kosh.aalto.fi
Full sample config file
The following code is placed in the config file created above
(i.e. ~/.ssh/config
on Mac/Linux or %USERPROFILE%/.ssh/config
on windows):
# general login server
Host kosh
User LOGIN_NAME
Hostname kosh.aalto.fi
# Triton, via kosh
Host triton_via_kosh
User LOGIN_NAME
Hostname triton.aalto.fi
ProxyJump kosh
Now, you can just do command such as:
$ ssh triton
$ rsync triton:/m/cs/scratch/some_file .
## And this works in any other tool that uses ssh.
directly, by using the triton
alias. Note that the Triton rule
uses the name kosh
which is defined in the first part of the
file.
References
man ssh gives a detail of the SSH command line options
man ssh_config gives a detail of all of the config file options
https://www.mn.uio.no/geo/english/services/it/help/using-linux/ssh-tips-and-tricks.html - long-form guide
https://blog.0xbadc0de.be/archives/300 - long-form guide
https://www.phcomp.co.uk/Tutorials/Unix-And-Linux/ssh-passwordless-login.html
https://linuxize.com/post/ssh-command-in-linux/#how-to-use-the-ssh-command
https://linuxize.com/post/how-to-setup-passwordless-ssh-login/
https://www.ssh.com/ssh/ - commercial site
The Zen of Scientific computing
Have you ever felt like all your work was built as a house of cards, ready to crash down at any time?
Have you ever felt that you are far too inefficient to survive?
No, you’re not alone. Yes, there is a better way.
Production code vs research code
Yes, many things about software development may not apply to you:
Production code:
you sort of know what the target is
code is the main result
must be maintainable for the future
Research code:
you don’t know what the target is
code is secondary
But research code often becomes important in the future, so not all can be an unmaintainable mess…
Research code pyramid
I know that not all research code will be perfect.
But if you don’t build on a good base, you will end up with misery.
Yes, you can’t do everything perfectly
Not everything you do will be perfect. But it has to be good enough to:
be correct
be changed without too much difficulty
be run again once reviews come in
ideally, not wasted once you do something new
Even as a scientist, you need to know the levels of maturity so that you can do the right thing for your situation.
It takes skill and practice to do this right. But it is part of being a scientist.
This talk’s outline:
Describe different factors that influence code quality
Describe what the maturity levels are and when you might need them
What aspects can you improve?
Below are many different aspects of scientific computing which you can improve.
Some are good for everyone. Some you may not need yet. Different levels of maturity are presented for each topic, so that you can think about what is right for you.
Version control
Version control allows you to track changes and progress.
For example, you can figure out what you just broke or when you introduced a bug. You can always go back to other versions.
Version control is essential to any type of collaboration.
L0: no version control
L1: local repo, just commit for yourself
L2: shared repo, multiple collaborators push directly
L3: shared repo, pull-request workflow
Resources:
Github, CodeRefinery Gitlab, your institution’s equivalent, and many more.
CodeRefinery lessons, git-intro and git-collaborative)
Software Carpentry Git-novice lesson
Modular code
Modularity is one of the basic prerequisites to be able to understand, maintain, and reuse things - and also hard to get right at the beginning.
Don’t worry too much, but always think about how to make things reusability.
L0: bunch of copy-and-paste scripts
L1: important code broken out into functions
L2: separation between well-maintained libraries and daily working scripts.
Resources:
CodeRefinery Modular Code Development lesson
Organized workspaces
You will need to store many files. Are they organized, so that you can find them later, or will you get lost in your own mess?
L0: no particular organization system
L1: different types of data separated (original data/code/scratch/outputs)
L2: projects cleanly separated, named, and with a purpose
Resources:
I don’t know of good sources for this.
But you can find different recommendations for organizational systems
Workflow/pipeline automation
When you are doing serious work, you can’t afford to just manage stuff by hand. Task automation allows you to do more faster.
Something such as make
can automatically detect changed input
files and code and automatically generate the outputs.
L0: bunch of scripts you have to run and check output of by hand.
L1: hand-written management scripts, each output can be traced to its particular input and code.
L2:
make
or other workflow management tool to automate things.L3: Full automation from original data to final figures and data
Resources:
CodeRefinery Reproducible Research lesson
Reproducibility of environment
Is someone else able to (know and) install the libraries needed to run your code? Will a change in another package break your code?
Scientific software is notoriously bad at managing its dependencies.
L0: no documentation
L1: state the dependencies somewhere, tested to ensure they work
L2: pin exact versions used to generate your results
L3: containerized workflow or equivalent
Resources:
CodeRefinery Reproducible Research lesson
Documentation
If you don’t say what you do, there’s no way to understand it. You won’t be able to understand it later, either.
At minimum, there should be some README files that explain the big picture. There are fancier systems, too.
L0: nothing except scattered code comments
L1: script-level comments and docstrings explaining overall logic
L2: simple README files explaining big picture and main points (example)
L3: dedicated documenentation including tutorials, reference, etc.
Resources:
CodeRefinery Documentation lesson
Testing
You have to test your code at least once when you first run it. How do you know you don’t break something later?
Testing gives you a way to ensure things always work (and are correct) in the future by letting you run every test automatically.
There’s nothing more liberating than knowing “tests still pass, I didn’t break anything”. It’s extremely useful for debugging, too.
L0: ad-hoc and manually
L1: defensive programming (assertions), possibly some test data and scripts
L2: structured, comprehensive unit/integration/system tests (e.g.
pytest
)L3: continuous integration testing on all commits (e.g. Github Actions)
If code is easy to test, it is usually easy to reuse, too. Furthermore, making code testable makes it reusable.
Resources:
CodeRefinery Testing lesson:
Licensing
You presumably want people to use your work so they will cite you. If you don’t have a license, they won’t (or they might and not tell anyone).
Equally, you want to use other people’s work. You need to check their licenses.
L0: no license given / copy and paste from other sources
L1: license file in repo / careful to not copy incompatible code
L2: license tracked per-file and all contributors known.
Resources:
CodeRefinery Software social coding
Distribution
Code can be easy to reuse, but not easy to get. Luckily there are good systems for sharing code.
L0: code not distributed
L1: code provided only if someone asks
L2: code on a website
L3: version control system repo is public
L4: packaged, tagged, and versioned releases
Resources:
Python: Packaging tutorial
Similar for any other language you may use
Reuse
Are you aware of what what others have already figured out through their great effort?
Choosing the right thing to build off of is not always easy, but you must
L0: reinvent everything yourself
L1: use some existing tools and libraries
L2: deep study of existing solutions and tools, reuse them when appropriate
Resources:
I don’t know where to refer you to right now.
Collaboration
Is science like monks working in their cells, or a community effort?
These skills move so fast that learning peer-to-peer is one of the best ways to do it.
There’s a whole other art of applying these skills which isn’t taught in classes.
If you don’t work together, you will fall behind.
L0: you work alone and re-invent everything
L1: you occasionally talk about results or problems
L2: collaborative package development
L3: code reviews, pair programming, etc.
L4: community project welcoming other contributors
Resources:
Most every CodeRefinery lesson
Plenty more
The future
Science with computers can be extremely enjoyable… or miserable.
We are here to help you. You are here to others.
Will we?
Practical git PRs for small teams
This is the prototype of a mini-course about using git for pull requests (PRs) within small teams that are mostly decentralized, perhaps don’t have test environments everywhere, and thus standard review and CI practices don’t directly apply. The audience is expected to be pretty good with git already, but wondering how PRs apply to them.
The goal isn’t to convince you to use PR-based workflows no matter the cost, but instead think about how the tech can make your social processes better.
Status: Alpha-quality, this is more a start of a discussion than a lesson. Editor: rkdarst
Learning objectives
Why use pull requests?
What are the typical procedures of using PRs?
How do we adapt our team to use them?
How does this improve our work?
Why pull requests?
pull request = change proposal
You have some work which should be reviewed before deploying.
Someone is expected to give useful feedback
Maybe a quick idea, easier to draft&discuss than talk about it abstractly
pull request = review request
You’ve made the change already, or you are already the expert so don’t expect it to really be debated.
You edited it in deployment, or it is already live
Or you are the expert, and others don’t usually give suggestions
Still, someone might have some comments to improve your integration with other services.
pull request = change announcement
You don’t expect others to ever make suggestions
But you think others should know what you are doing, to distribute knowledge
If no one comments, you might merge this yourself in a few hours or days.
pull request = CI check
You want the automated tests/ continuous integration (CI) to run to verify the change works.
If it works, you might merge yourself even without others knowing.
A bit safer than CI after the push to master.
Benefits of PRs
Multiple sets of eyes
Everything should be seen by multiple people to remove single point of failure problems.
Share knowledge about how our services work.
Encourages writing a natural-language description of what you are doing - clarify purpose to yourself and others
Suggestion or draft
Unsure if good idea, make a draft to get feedback
Discuss and iterate via issue. No pressure to make it perfect the first time, so writing is faster
CI
Run automated tests before merging
Requires a test environment
Very important for fast and high-quality development.
Discussion
Structured place for conversation about changes
Refer to and automatically close issues
How do you make a pull request
Technically, a pull request is:
A git branch
Github/Gitlab representation of wanting to merge that head branch into some base branch (probably the default branch).
Discussion, commenting, and access control around that
So, there’s nothing really magic beyond the git branch.
We don’t really need to repeat existing docs: you can read how to on Github, Gitlab, etc. yourself.
A PR starts with a branch pushed to the remote.
Then, the platform registers a pull request which means “I want to merge this branch into master”. (Yes, a bit misnamed) Go to the repo page and you see a button, or a link to make one is printed when you push.
git-pr makes it easy - fewest possible keystrokes, no web browser needed, and I use the commit message also as the PR message to save even more time.
Pull request description
These days, I (rkdarst) tend to write my initial PR message into my commit, then
git-pr
will use that when I push. This also stores the description permanently in the git history.There is also the concept of “pull request templates” within Github/Gitlab. (They can keep changes organized, provide checklists, and keep things moving. But after fast small PRs via
git-pr
I really don’t like this being required for small changes where I can write the important aspects myself.)What should go in a description:
Why are changes being made?
What are the changes?
Risks, benefits, etc…
Is it done or a work in progress? Need help?
What should be reviewed?
CI checks
CI pipelines can run on the pull request and will report failures. On Github, success is a green check. Can be shared with checks of direct pushes.
Even if there aren’t tests, syntax checks and similar could be useful.
Semantics around PRs
How do you actually review and handle a PR once it comes in? What’s the social process?
Actions you can take
Actions you can do from the web (Github):
merge: accept it
comment: add a message
approve/request changes: “review” you can do from “file list” view
line comments (*): from diff view, you can select ranges of lines and comment there
suggestions (*): from diff, you can select ranges of lines then click “suggest” button to make a suggestion. This can easily be applied from web.
commit suggestion (*): from diff view, you can accept the suggestion and it makes a commit out of it.
(*) items can be done in batch from file view, to avoid one email for every action.
draft pull request can’t be merged yet. There is a Github flag for this, or sometimes people prefix with
WIP:
.assign a reviewer: request people to do the review, instead of waiting for someone to decide themselves.
close: Reject the change and mark the PR as closed.
My usual procedure
If it’s good as-is, just click “merge”
If it’s a new contributor I usually try to say some positive words, but in long-term efficient mode, I don’t see a need to.
Otherwise, comment in more detail. Line-based comments are really useful here. Commenting can be line-based, or an overall “accept”, “request changes”, or “comment” on the PR as a whole (see above)
If you aren’t sure if you are supposed to merge it (yet), but it looks good, just “approve” it.
This cas be a sign to the original author that it looks sane to you, and they merge when they are ready.
If someone marks my PR “approve” but don’t merge it themselves, I will merge it myself as soon as I am ready.
If someone else requested changes, I’ve done the changes (if I agree), and I think there’s not much more to discuss, I will just merge it myself without another round of review.
You can both make suggestions and approve (usually with some words saying no need to accept hte suggestions if they don’t make sense).
How do humans use PRs?
Who should merge them?
What happens when the person making the PR is the only one (or main one) who can give it a useful review?
Then, perhaps your team needs some redundancy…
You can assign reviewers, if you want to suggest who should take a look.
Discuss as part of your team for each project. This leads to a social discussion of “how do we collaborate in practice?”
When do you merge a pull request?
How much review do you need to give, if you aren’t the expert?
My proposal:
If you are aren’t the author, and can evaluate it, merge it ASAP
If you aren’t an expert, but no one else has merged it after a few days, merge it yourself. Or if you are the original author and need it.
If no one else has after a week, anyone does it (mainly relevant to external contributors).
I don’t feel bad making a PR if I expect I will be the one to merge it a few days later: at least I gave people a chance to take part.
How do you keep up to date with PRs?
How can our team adapt to PRs?
Traditional software project or utility
PRs make a lot of sense
Deployments: There is no testing environment!
Yes, there should be a test environment, but let’s be real: many thing start off too small to have that. What do we do about it?
“If the change has already been made, it’s not really a change proposal”
PRs don’t work too well here, but when you think about it, it would be nice to be able to test before deploying!
Maybe this gives us encouragement to use more PRs
Make a PR anyway even though it’s in productive, as a second-eyes formality.
All of our projects are independent
Is this good for knowledge transfer?
What advantages would we see with more PRs?
Other
These things can make our work a bit soother, and something we can discuss.
git-pr
I got annoyed at needing too many keystrokes, and having to go to a web browser to create the pull requests
I created git-pr to make this as fast as possible, and it really does feel much smoother now
Works equally for Github and Gitlab, at least.
Cheatsheets: git the way you need it, Gitlab (produced by Gitlab, with Aalto link)
Training
We have various recommended training courses for researchers who deal with computation and data. These courses are selected by researchers, for researchers and grouped by level of skill needed.
Training
Scientific computing and data science require special, practical skills in programming and computer use. However, these aren’t often learned in academic courses. This page is your portal for getting these skills. The focus is practical, hands-on courses for scientists, not theoretical academic courses.
Scientific Computing in Practice
SCIP is a lecture series at Aalto University which covers hands-on, practical scientific computing related topics. Lectures are open for the entire Aalto community as well as our partners at FGCI consortium.
Examples of topics covered at different lectures: HPC crash course, Triton kickstarts, Linux Shell, Parallel programming models: MPI and OpenMP, GPU computing, Python for scientists, Data analysis with R and/or Python, Matlab and many others.
If you are interested in a re-run of our past courses or if you want to suggest a new course, please take this survey.
August 2023 / Linux Shell Basics
Updates
10/8 - Registration is now open at https://link.webropol.com/ep/linuxshell2023
The material: aaltoscicomp.github.io/linux-shell
YouTube playlist of all days: https://www.youtube.com/playlist?list=PLZLVmS9rf3nPRb-QjrWsg_fTUfJ5Bbv07
Q&A (“notes”) for all days: https://hackmd.io/@AaltoSciComp/shellbasics2023
Part of Scientific Computing in Practice lecture series at Aalto University.
Audience: Scientists, researchers, and others looking for an extensive intro into Linux shell / terminal. Primary audience is academics in Finland, outsiders are welcome to register and are accepted if there is space (there always is space).
About the course: The Linux shell lets you work efficiently on remote computers and automate bigger projects - whether you are managing a lot of data or running programs on a computer cluster. Without it, you are often stuck when you need to do move beyond basic tools like Jupyter notebooks. This course covers the Bash shell, but the principles apply to other shells such as zsh.
This course will cover the basics so that you’ll know what the shell is, are comfortable using it for your own projects, and are able to attend other courses with the shell as a prerequisite. We’ll get familiar with the command line, files and directories, and other things you often find in shell environments. We will unleash the power of that blinking cursor in the terminal window. Windows/Mac/Linux users are warmly welcome - regardless of what you use on your desktop, you’ll need this when using more powerful remote computers.
We will start with somewhat basics like files and processes and go up to command line magic like redirections and pipes. This should be enough to get started with the Linux terminal.
There is an advanced part of this course given later in the spring which will go through scripting and automation in more detail (part 2 in the material).
Lecturer: Ivan Tervanto, D. Sc., Science IT / Department of Applied Physics, Aalto University
Time, date, place: the course consists of three hands-on sessions (3h each), via Zoom. On-site option is possible if there is enough interest from course participants.
On-site: (not given this year)
Zoom: link to be posted to the registered participants list
Tue 29.8 12:00-15:00
Wed 30.8 12:00-15:00
Thu 31.8 12:00-15:00
Course material: will be mostly based on the first part of aaltoscicomp.github.io/linux-shell.
Cost: Free!
Registration: Please register here
Credits/certificate: It is not possible to obtain certificates or credits for this course.
Required setup: During the tutorials we’ll use a terminal with a BASH shell, means that either you have a Linux/Mac computer at your place or a Windows PC with the Git BASH, or VDI to a Linux, or SSH client installed for accessing any of Linux server. If you are at Aalto university you can run ssh USERNAME@kosh.aalto.fi
to connect on a native Linux shell. Other servers are listed here . If you are at University of Helsinki, see the list of available SSH linux servers at this link. We will cover ssh connections at the beginning of the first day.
Additional course info at: scip -at- aalto.fi
What’s next?: After this course, check out CodeRefinery, 19-21 and 26-28 September 2023. CodeRefinery is the next step in scientific programming, not teaching programming but the tools to do it comfortable and without wasting time on problems.
Nov 7th - Nov 10th 2023 / Python for Scientific Computing
News and Important info
Thank you to all who came! The material will stay available indefinitely, and we hope it’s useful to you in the future, too. Please tell others about the course, and if you want to help us make more, get in touch.
Links
Register here to get emails about the course, but it’s not necessary to attend. The course is open to everyone in the world, with some partners providing special services (see below).
Video playlist from 2022 (this year is above)
This is a medium-advanced course in Python tools such as NumPy, SciPy, Matplotlib, and Pandas. It is suitable for people who know basic Python and want to know some internals and important libraries for science - basically, how a typical scientist actually uses Python. Read the learner personas to see if the course is right for you. Prerequisites include basic programming in Python.
Part of Scientific Computing in Practice lecture series at Aalto University, in partnership with CodeRefinery.
Partners
This course is hosted by Aalto Scientific Computing (Aalto University, Finland) and CodeRefinery. Our livestream, registration, materials, and published videos are free for all in the spirit of open science and education, but certain partners provide extra benefits for their own audience.
Staff and partner organizations:
Radovan Bast (CodeRefinery, The Artic University of Norway) (instructor, helper)
Richard Darst (ASC, Aalto University) (instructor, instructor coordinator, director)
Enrico Glerean (ASC, Aalto University) (instructor, registration coordinator, communication, helper)
Johan Hellsvik (PDC, NAISS, KTH) (instructor, helper)
Diana Iusan (UPPMAX, NAISS, Uppsala University) (instructor, helper)
Thomas Pfau (ASC, Aalto University) (instructor, helper)
Jarno Rantaharju (ASC, Aalto University) (instructor, helper)
Teemu Ruokolainen (ASC, Aalto University) (instructor, helper)
Sabry Razick (University of Oslo) (instructor, helper)
Simo Tuomisto (ASC, Aalto University) (instructor, helper)
Practical information
Registration
This is an online course streamed via Twitch (the CodeRefinery channel) so that anyone may follow along without registration. You do not need a Twitch account. There is a collaborative notes link which is used for asking questions during the course. The actual material is here.
While the stream is available even without providing personal data, if you register you may get collaborative notes access for asking questions and will support our funding by contributing to our attendance statistics.
Credits
It is possible to obtain a certificate from the course with a little extra work. The certificate is equivalent to 1 ECTS and your study supervisor will be able to register it as a credit in your university study credit system. Please make sure that your supervisor/study program accepts it.
Learners with a valid Aalto student number will automatically get the credit registered in Aalto systems.
To obtain a certificate/credit, we expect you to have registered to the course by 10/11/2023, follow the 4 sessions and provide us with at least the following 5 documents via email (1 text document, 4 or more python scripts/notebooks). Please remember to add your name and surname to all submitted files. If you are a student at Aalto University, please also add your student number.
1 text document (PDF or txt or anything for text): For each of the 4 days, write a short paragraph (learning diary) to highlight your personal reflections about what you have found useful, which topic inspired you to go deeper, and more in general what you liked and what could be improved.
4 (or more) .py scripts/notebooks: For each of the 4 days take one code example from the course materials and make sure you can run it locally as a “.py” script or as a jupyter notebook. Modify it a bit according to what inspires you: adding more comments, testing the code with different inputs, expanding it with something related to your field of research. There is no right or wrong way of doing this, but please submit a python script/notebook that we are eventually able to run and test on our local computers.
These 5 (or more) documents should be sent before 31/December/2023 23:59CET to scip@aalto.fi. If the evaluation criteria are met for each of the 5 (or more) documents, you will receive a certificate by mid January 2023. Please note that we do not track course attendance and if you missed one session, recordings will be available on Twitch immediately after the streaming ends.
NEW! Credit fast track: if you submit your homework by 17/November/2023 23:59CET, you get the credit/certificate before 30/Nov. If you submit after the 17/Nov deadline, your credit/certificate will be processed in January (see previous paragraph).
Additional course info at: scip -at- aalto.fi
Schedule
The course consists of four online hands-on sessions 3h each. All times EET (convert 9:50 to your timezone). The schedule is tentative, we may run earlier or later, so join early if attending a single lesson.
Warning
Timezones! Times in this page in the Europe/Helsinki timezone. In Central Europe, the course starts at 8:50! (convert 9:50 Helsinki to your timezone)
(week before) Installation help sessions (for sites that offer them)
Please connect to all sessions 10 minutes early: icebreakers and intro already starts then.
Tue 7.nov, 9:50-13:00
10:00 Intro
10:15 Jupyter
11:00 NumPy (Mostly the basic lesson, but we might touch also topics from Advanced NumPy ).
12:10 pandas…
Wed 8.nov, 9:50-13:00
10:00 pandas continued
10:30 matplotlib
12:10 data formats
12:20 productivity tools
Thu 9.nov, 9:50-13:00
10:00 scripts
11:00 library ecosystem
11:10 dependency management
11:10 binder
Fri 10.nov, 9:50-13:00
Preparation
Prerequisites include basic programming in Python.
Software installation:
See the installation page of the course material.
In principle, if you are at Aalto, the service https://jupyter.cs.aalto.fi should be sufficient to do most of this course without any local installations. Perhaps not everything, but it will be OK for most people.
Zoom, if you are registered for one of the exercise sessions.
Mental preparation: Online workshops can be a productive format, but it takes some effort to get ready. Browse these resources:
Attending a livestream workshop, good to read in detail.
How to use HackMD to take answer questions and hold discussions.
It is useful to watch or read the Linux shell crash course, since these basic command line concepts are always useful.
Community standards
This is a large course, and we will have many diverse groups attending it. There will be people attending at all different levels, from “just learned Python” to “been using Python for a while and want to see some tips and tricks”. Everyone will choose their own path, some people will be more hands-on or more “watching”. Everyone is be both a teacher and a learner. Even our instructors are always learning things and make mistakes (and this is part of the point!). Please learn from our mistakes, too!
This course consists of both lectures, hands-on exercises, and demos. It is designed to have a range of basic to advanced topics: there should be something for everyone.
The main point this course is the exercises. If you are with a group, we hope people to work together and help each other. We expect everyone to help each other as best as they can with respect for different levels of knowledge - at the same time be aware of your own limitations. No one is better than anyone else, we just have different existing skills and backgrounds.
If there is anything wrong, tell us - HackMD is best. If you need to contact us privately, you can message the host on Zoom, instructor chat is via CodeRefinery chat, and by email contact CodeRefinery support. This could be as simple as “speak louder / text on screen is unreadable” or someone is creating a harmful learning environment.
Code of Conduct
We are committed to creating a friendly and respectful place for learning, teaching, and contributing. You can read our Code of Conduct here. If you need to report any violation of the code of conduct, you can email the organisers at scip _at_ aalto.fi, alternatively you can also use this web form.
Material
Contact
Registration inquiries: scip -at- aalto.fi
Other organizations who want to join as a partner: scip -at- aalto.fi
Chat with us on CodeRefinery chat (anyone) or Aalto University scicomp chat
See also
January 2024 / Linux Shell Scripting
Part of Scientific Computing in Practice lecture series at Aalto University.
Audience: Anyone with intermediate or advanced level in Linux shell.
About the course: You might have already used Linux shell commands interactively, but how do you go from interactive terminal use to non-interactive workflow with scripts? This course is oriented on those who want to start using BASH programming fully and use terminal efficiently.
We expect that course participants are familiar with the shell basics (experience with BASH, ZSH, etc). We somewhat touch the Part 1 of the Linux Shell tutorial, and continue to Part 2. Though we expect that participant knows how to create a directory and can edit file from the linux shell command line. We will be scripting a lot, there will be lots of demos and real practicing.
Lecturer: Ivan Degtyarenko, D. Sc., Science IT / Department of Applied Physics, Aalto University
Place: Online and in-person at Room U135a (U7) Otaniemi (in-person only if there are enough participants). Please register for receiving the link to streaming and other infos for in-person sessions.
Time, date (all times EET):
Date |
Time |
---|---|
Tue 16.01 |
12:00-15:00 |
Wed 17.01 |
12:00-15:00 |
Thu 18.01 |
12:00-15:00 |
Course material: will be mostly based on the second part of the Linux shell tutorial. Videos are archived at this playlist
Registration: You can register at this link
Credits and certificates: We do not provide credits or certificates for this course.
Setup instructions: For the online course we expect you to have Zoom client installed on your local workstation/laptop. Then we expect you to have access to Linux-like shell terminal. You can check BASH installation instructions for various operating systems at this link. If needed participants can be provided with access to the Triton HPC cluster for running examples.
Additional course info at: scip -at- aalto.fi
Tuesday Tools & Techniques for High Performance Computing
Quick links
Watching link: https://twitch.tv/coderefinery
Materials

Do you use supercomputers in your research work? Are you curious about making your computing faster and more efficient? Join us for TTT4HPC: four self-contained episodes on best practices in High Performance Computing. This is a great chance to enhance your computational skills. What you will learn is also used a lot outside academia whenever large scale computations are needed.
The course happens online. Mornings (2h) lectures via TwitchTV. Afternoons (1.5h) hands-on exercises on zoom with our HPC experts.
Here below you find the list of episodes and how to register. Episodes are self-contained, you can join only for the episodes that are useful for your research.
Episode 1 - 16/04/2024 - HPC Resources: RAM, CPUs/GPUs, I/O
Content: focus on HPC computational resources, starting with understanding and managing memory, CPUs, and GPUs, monitoring computational processes and I/O, utilizing local disks and ramdisks, and extending into benchmarking and selecting job parameters.
Instructors: Jarno Rantaharju, Radovan Bast, Diana Iusan, Simo Tuomisto
Learning materials: Managing resources on HPC
Registration: Please register at this link
- Schedule for the day in EEST (Helsinki, Oslo+1) timezone
09:50-10:00 Streaming starts with icebreakers https://www.twitch.tv/coderefinery
- 10:00-12:00 Episode 1 - HPC Resources
Job scheduling and Slurm basics
How to choose the number of cores by timing a series of runs
Measuring and choosing the right amount of memory
I/O Best Practices
12:00-13:00 Lunch (on your own)
13:00-14:30 Hands-on exercises on zoom (register to receive link)
How to attend: You can watch the streaming at https://www.twitch.tv/coderefinery, but you need to register to get access to the shared document for questions and answers, and the zoom room for the afternoon session.
Episode 2 - 23/04/2024 - Day-to-day working on clusters
Content: focus on software development on HPC, syncing data, interactive work with HPC, vscode
Learning materials: coming soon
Registration: Please register at this link
- Schedule for the day in EEST (Helsinki, Oslo+1) timezone
09:50-10:00 Streaming starts with icebreakers https://www.twitch.tv/coderefinery
- 10:00-12:00 Episode 2 - Day-to-day working on clusters
Syncing data and code
Developing and interacting with HPC
Using VScode with HPC clusters
12:00-13:00 Lunch (on your own)
13:00-14:30 Hands-on exercises on zoom (register to receive link)
Episode 3 - 07/05/2024 - Containers on clusters
Content: focus on containers with Apptainer/Singularity, how to build containers for HPC, how to work with the filesystem, other practical examples with containers
Learning materials: coming soon
Registration: Please register at this link
- Schedule for the day in EEST (Helsinki, Oslo+1) timezone
09:50-10:00 Streaming starts with icebreakers https://www.twitch.tv/coderefinery
- 10:00-12:00 Episode 3 - Containers on clusters
Intro to containers on HPC
Using Apptainer/Singularity in practice
Advanced cases for containers in HPC
12:00-13:00 Lunch (on your own)
13:00-14:30 Hands-on exercises on zoom (register to receive link)
Episode 4 - 14/05/2024 - Parallelization and workflows
Content: focus on parallelization with HPC, efficient parameter sweeps, workflow automation, hyperscaling pitfalls
Learning materials: coming soon
Registration: Please register at this link
- Schedule for the day in EEST (Helsinki, Oslo+1) timezone
09:50-10:00 Streaming starts with icebreakers https://www.twitch.tv/coderefinery
- 10:00-12:00 Episode 4 - Parallelization and workflows
Parallelization with HPC
Workflow automation
Hyperscaling pitfalls
12:00-13:00 Lunch (on your own)
13:00-14:30 Hands-on exercises on zoom (register to receive link)
Prerequisites
You won’t be able to engage with the exercises and examples of the course if you don’t have access to an HPC cluster. Usually employers from higher education institutions can always request access to HPC resources. If you are unsure, please get in touch with your local support. Being familiar with basics tools used with HPC and remote computing is fundamental for this course. Familiarize yourself with the Linux command line. You should be familiar with basics concepts and rules of HPC systems. You can watch our past training on “Introduction to HPC (aka kickstart)”
Credits
It is possible to receive 1 ECTS. Here what is required:
be affiliated with a research organisation. Your submission must come from an email address of a research organisation.
attend all four zoom exercise sessions. During the zoom session send a zoom chat message to Enrico Glerean to mark your presence. You can miss at maximum one session. Please arrange an extra task with Enrico Glerean to compensate for the absence.
Submit a tar or zip file with four folders, one folder for each of the four episodes. Inside each folder include the scripts, code, commands that you wrote and run during the exercise sessions. Please make sure that all the files submitted have clear comments that explain each of the steps in relation to the exercises and what was done in the zoom session. Provide the output of each of the scripts or commands that you have run (for example as a copy paste from the terminal into a txt file is enough). If the output is very long, it is ok to just copy what is left visible in the terminal.
Submit a learning diary for each episode: a short text that highlights i) what went well with the episode, ii) what could be improved, iii) how you will use what you have learned.
From your organisation’s email address, email all these files to scip _at_ aalto.fi by the last day of May 2024. Learners at Aalto University: please include your student number to get the credit registered automatically. Learners from other universities: you might want to check with your study coordinator if you can convert the certificate from this course into 1 ECTS. If they have questions, you can tell them to get in touch with Enrico Glerean
Questions
Q: Can I get a certificate even though I am not affiliated with a University or other research organisation?
A: Unfortunately we provide credits only for students or researchers affiliated with research organisations.
Q: I received a calendar invitation only for one of the episodes, but I marked that I want to register for all episodes, how can I get a calendar invitation?
A: We do not have a clever system for sending multiple calendar invitations at once. If you find calendar invitations useful, you need to register manually to each of the four episodes.
Q: The materials are not yet ready, when will they be ready?
A: This is the first run ever for this course, so we are still tweaking learning materials until the last minutes before the course. Your feedback is highly appreciated to turn this pilot into a course that we can run again in the future. Consider contributing to the learning materials by joining the CodeRefinery Zulip chat.
Contributors and Acknowledgments
Course coordinator: Enrico Glerean.
Episodes coordinators: Richard Darst, Samantha Wittke, Simo Tuomisto, Enrico Glerean, Thomas Pfau
Contributors to learning materials: Richard Darst, Samantha Wittke, Simo Tuomisto, Enrico Glerean, Thomas Pfau, Radovan Bast, Diana Iusan, Dhanya Pushpadas, Hossein Firooz, Jarno Rantaharju, Maiken Pedersen.
Communication partners: CSC, University of Trömsö, University of Bergen, Uppsala University, University of Oslo.
See also / more info
Chat with us in the CodeRefinery chat or Aalto SciComp chat. Or private contact via Enrico Glerean, scip -a-t- aalto.fi.
June 2024 / Intro to Scientific Computing / HPC Summer Kickstart
Quick links
This page is generated based on the 2023 version. The information and schedule will still be updated - expect significant schedule changes.
Registration is not yet open.
Kickstart is a three × half day course for researchers to get started with high-performance computing (HPC) clusters. The first day serves as a guide to skills you need in your career: a map to the types of resources that are available and skills you may need in your career, so that you can be prepared when you need more in the future. This part is especially suitable to new researchers or students trying to understand computational/data analysis options available to them. It won’t go into anything too deep, but will provide you with a good background for your next steps: you will know what resources are available and know the next steps to use them.
The second and third days take you from being a new user to being competent to run your code at a larger scale than you could before using a computer cluster. This part is good for any researcher who thinks they may need to scale up to larger resources in the next six months, in any field - this is many new researchers in our departments. Even if you don’t use computing clusters, you will be better prepared to understand how computing works on other systems. If you are a student, this is an investment in your skills. By the end of the course you get the hints, ready solutions and copy/paste examples on how to find, run and monitor your applications, and manage your data.
If you are at Aalto University: the course is obligatory for all new Triton users and recommended to all interested in the field.
This course is part of Scientific Computing in Practice lecture series at Aalto University, supported by many others outside Aalto, and offered to others as part of CodeRefinery.
Practical information
This is a livestream course with distributed in-person exercise and support. Everyone may attend the livestream at https://twitch.tv/coderefinery, no registration needed, and this is the primary way to watch all sessions. Aalto has an in-person exercise and support session (location TBA), as do some other partners, and a collaborative document is used for a continuous Q&A session.
Time, date: 4 – 6 June 2024 (Tue–Thu). 11:50-16:00 EEST
Place: Online via public livestream, Zoom exercise sessions for partners, and probably in-person discussion/practice rooms at some campus.
Registration: Please register at this link: TODO . It’s OK to attend only individual sessions is fine.
Cost: Livestream is free to everyone. Aalto in-person is free of charge for FGCI consortium members including Aalto employees and students.
Additional course info at: scip@aalto.fi
Other universities
If you are not at Aalto University, you can follow along with the course and will learn many things anyway. The course is designed to be useful to people outside of Aalto, but some of the examples won’t directly work on your cluster (most will, anyway we will give hints about adapting). How to register if you are not at Aalto:
Regardless of where you are from, you may use the primary registration form to get emails about the course. You don’t get anything else.
Participants from University of Helsinki can follow how to connect to their Kale/Turso cluster by following their own instructions.
Participants from University of Oulu: please follow instructions on how to access the Carpo2 computing cluster.
Tampere: this course is recommended for all new Narvi users and also all interested in HPC. Most things should work with simply replacing triton -> narvi. Some differences in configuration are listed in Narvi differences
[no active support] CSC (Finland): Participants with CSC user account can try examples also in CSC supercomputers, see the overview of CSC supercomputers for details on connecting, etc.
If you want to get your site listed here and/or help out, contact us via the CodeRefinery chat (#kickstart-aalto stream). We have docs for other sites’ staff to know what might be different between our course and your cluster.
Schedule
All times are EEST (Europe/Helsinki time)!
The daily schedule will be adjusted based on the audience’s questions. There will be frequent breaks and continuous questions time going on, this is the mass equivalent of an informal help session to get you started with the computing resources.
Subject to change
Schedule may still have minor updates, please check back for the latest.
Day #1 (Tue 4.jun): Basics and background
11:50–12:00: Joining time/icebreaker
12:00–12:10 Introduction, about the course Richard Darst and other staff Materials: Summer Kickstart intro
12:10–12:25: From data storage to your science Enrico Glerean and Simo Tuomisto
Data is how most computational work starts, whether it is externally collected, simulation code, or generated. And these days, you can work on data even remotely, and these workflows aren’t obvious. We discuss how data storage choices lead to computational workflows. Materials: SciComp Intro
12:25–12:50: What is parallel computing? An analogy with cooking Enrico Glerean and Thomas Pfau
In workshops such as this, you will hear lots about parallel computing and how you need it, but rarely get a understandable introduction to how they relate and which are right for you. Here, we give a understandable metaphor with preparing large meals. Slides
13:00–13:25: How big is my calculation? Measuring your needs. Simo Tuomisto and Thomas Pfau
People often wonder how many resources their job needs, either on their own computer or on the cluster. When should you move to a cluster? How many resources to request? We’ll go over how we think about these problems. Materials: How big is my program?
13:25–13:50: Behind the scenes: the humans of scientific computing Richard Darst and Teemu Ruokolainen
Who are we that teach this course and provide SciComp support? What makes it such a fascinating career? Learn about what goes on behind the scenes and how you could join us.
14:00–14:45: Connecting to a HPC cluster Thomas Pfau and Jarno Rantaharju
Required if you are attending the Triton/HPC tutorials the following days, otherwise the day is done.
14:00–14:20?: Livestream introduction to connecting
14:??–15:00: Individual help time in Zoom (links sent to registered participants)
Break until 15:00 once you get connected.
Material: Connecting to Triton
15:00–15:25: Using the cluster from the shell (files and directories) Richard Darst and Teemu Ruokolainen
Once we connect, what can we do? We’ll get a tour of the shell, files diretories, and how we copy basic data to the cluster. Material: Using the cluster from a shell.
15:25–15:50: What can you do with a computational cluster? (Jarno Rantaharju and Richard Darst)
See several real examples of how people use the cluster (what you can do at the end of the course): 1) Large-scale computing with array jobs, 2) Large-scale parallel computing. Demo.
Preparation for day 2:
Remember to read/watch the “shell crash course” (see “Preparation” below) if you are not yet confident with the command line. This will be useful for tomorrow.
Day #2 (Wed 5.jun): Basic use of a cluster (Richard Darst, Simo Tuomisto)
11:50–12:00: Joining time/icebreaker
12:00–12:05: Introduction to days 2-3
12:05–12:30 Structure of a cluster: The Slurm queueing system
12:30–15:00: Running your first jobs in the queue
15:00–15:30: Other things you should know about the HPC environment
15:30–16:00: Q&A
Day #3 (Thu 6.jun): Advanced cluster use (Simo Tuomisto, Richard Darst)
11:50–12:00: Joining time/icebreaker
12:00–12:30: What does “parallel” mean?:
12:30–14:00: Forms of parallelization
14:00–14:30: Laptops to Lumi
You now know of basics of using a computing cluster. What if you need more than what a university can provide? CSC (and other national computing centers) have even more resources, and this is a tour of them. Slides from 2022 here.
14:40–15:30: Running jobs that can utilize GPU hardware:
15:30–16:00: Ask us anything
Preparation
We strongly recommend you are familiar with the Linux command line. Browsing the following material is sufficient:
Basic Linux shell and scripting (important) (or read/watch the shorter crash course / video)
How to attend: Online workshops can be a productive format, but it takes some effort to get ready. Browse these resources:
Attending a livestream workshop, good to read in detail (ignore the CodeRefinery-specific parts).
How to use HackMD to take answer questions and hold discussions.
Technical prerequisites
Software installation
SSH client to connect to the cluster (+ be able to connect, see next point)
Zoom (if attending breakout rooms)
Cluster account and connection verification:
Access to your computer cluster.
Aalto: if you do not yet have access to Triton, request an account in advance.
Then, connect and get it working
Aalto (and possibly useful to others): try to connect to Triton to be ready. Come to the Wednesday session for help connecting (required).
Next steps / follow-up courses
Keep the Triton quick reference close (or equivalent for your cluster), or print this cheatsheet if that’s your thing.
Each year the first day has varying topics presented. We don’t repeat these every year, but we strongly recommend that you watch some of these videos yourself as preparation.
Very strongly recommended:
When and how to ask for help (very useful)
Git intro (useful)
Other useful material in previous versions of this course:
Scientific Computing workflows at Aalto - concepts apply to other sites, too (optional): lecture notes and video, reference material.
Tools of scientific computing (optional): lecture notes and video
While not an official part of this course, we suggest these videos (co-produced by our staff) as a follow-up perspective:
Attend a CodeRefinery workshop, which teaches more useful tools for scientific software development.
Look at Hands-on Scientific Computing for an online course to either browse or take for credits.
Cluster Etiquette (in Research Software Hour): The Summer Kickstart teaches what you can do from this course, but what should you do to be a good user.
How to tame the cluster (in Research Software Hour). This mostly repeats the contents of this course, with a bit more discussion, and working one example from start to parallel.
Community standards
We hope to make a good learning environment for everyone, and expect everyone to do their part for this. If there is anything we can do to support that, let us know.
If there is anything wrong, tell us right away - if you need to contact us privately, you can message the host on Zoom or contact us outside the course. This could be as simple as “speak louder / text on screen is unreadable / go slower” or as complex as “someone is distracting our group by discussing too advanced things”.
Material
See the schedule
Course archive
Currently active (upcoming) courses have been moved to the training index. Below is a list of past courses.
This course list is used to be at science-it.aalto.fi/scip page, but that page is now deleted. This series has existed since 2016.
2020
MPI introduction (February 2020)
Hands-on Molecular Dynamics with LAMMPS (February 2020)
Linux shell scripting (March 2020)
Matlab advanced (April 2020)
Mega CodeRefinery (June 2020, materials, videos)
FGCI kickstart (June 2020)
Linux shell basics (September 2020)
Python for scientific computing (September 2020, materials)
Data analysis workflows with R and Python (October 2020, materials)
CodeRefinery online (October 2020, materials)
GPU computing in practice (December 2020)
2021
Introduction to Data Analysis strategies at Aalto, Linux shell and HPC kickstart 2021 (Jan/Feb 2021, materials part 1, materials part 2, videos)
Introduction to MPI (March 2021)
Linux Shell Scripting (March 2021, materials)
Hands-on data anonymization (April 2021, videos: day1, day2, day3, dat4)
Code Refinery workshop (May 2021, materials, videos)
Software design for scientific computing (April 2021, materials)
Matlab Advanced (May 2021, materials)
Introduction to Julia (August 2021 & October 2021, materials)
Python for Scientific Computing (October 2021, materials, videos)
Linux Shell Basics (November 2021, materials)
Matlab Basics (November 2021, materials)
CodeRefinery workshop (September 2022)
2022
2023
Announcement maillist
Events and other Aalto Scientific Computing (Science-IT) announcements distributed over several lists such as the Triton-users and department mailing lists. In addition we run the scicomp-announcements@list.aalto.fi maillist that covers everyone else who wants to stay tuned and receive Science IT news.
The moderated list is free to subscribe / unsubscribe at any time, accepts all emails including non-Aalto ones.
Future courses Autumn 2023 courses - Linux Shell, CodeRefinery, Python for Scientific Computing, … and more! We are always adding interesting courses. Please check this page once in a while. If you are interested in a re-run of our past courses or if you want to suggest a new course, please take this survey.
Anyone can sign up for announcements at the SCIP announcement mailinglist.
Our most important courses
These are the most important courses we recommend to new users:
These are other quite important courses we have developed:
Linux shell <https://aaltoscicomp.github.io/linux-shell/> Python for Scientific Computing <https://aaltoscicomp.github.io/python-for-scicomp/> Data analysis workflows in Python and R <https://aaltoscicomp.github.io/data-analysis-workflows-course/>
Other interesting courses
Data management, Reproducibility, open science
Other relevant courses by Aalto Open Science team will be listed at: https://www.aalto.fi/en/services/training-in-research-data-management-and-open-science
Other courses on scientific computing and data management
Please check https://mycourses.aalto.fi/ for other courses at Aalto and https://www.csc.fi/en/training for training courses and events at CSC.
MOOC on scientific computing:
Skills map
There is a lot to learn, and it all depends on each other. How do you get started?
Our training map Hands-on Scientific Computing sorts the skills you need by level and category, providing you a strategy to get started.
In order to do basic scientific computing, C (Linux and shell) is needed. To use a computer cluster, D (Clusters and HPC) is useful. E (scientific coding) is useful if you are writing your own software.
Help
Don’t go alone, we are there! There is all kinds of “folk knowledge” to efficiently use the tools of scientific computing, and we would like to learn that. In particular, our community is welcome to come to our SciComp garage even for small random chats about your work, but there are plenty of other ways to ask for help, too.
Help
There are many ways to get help with your scientific computing and data needs - in fact, so many you don’t know what to use. This page lists how to ask for help, for different kinds of needs.
Video
Wonder if you should, or how, to ask for help? video: When and how to ask for help (slides)
I don’t know my exact question, or even if I should have a question |
Well-defined task and end goal |
Significant or open-ended problem solving |
Issues with your own Triton account |
General needs at Aalto University, not related to SciComp |
---|---|---|---|---|
SciComp garage to discuss, or … |
Search scicomp.aalto.fi or the Issue tracker for answers, then … |
Open an issue at the issue tracker so we can keep track, and possibly … |
scicomp@aalto.fi email (account issues only, not general questions), then if urgent … |
servicedesk@aalto.fi for IT issues, or … |
SciComp chat brainstorming |
SciComp chat question (small questions), or … |
Drop by SciComp garage to discuss details, or … |
then if needed … |
researchdata@aalto.fi for research data related topics. |
SciComp issue tracker post (big questions), and/or/then, if needed … |
We’ll create a Research Software Engineer project on the topic (you could also start here) |
SciComp chat (e.g. “is Triton down for others?”) |
||
SciComp Garage co-working |
Don’t forget that you can and should discuss among your research group, too!
Formulate your question
We get many requests for help which are too vague to give a useful response, so we delay while we try to find something better than “please explain more”, which slows everything down. So, when sending us a question, always try to clarify these points to get the fastest solution:
Has it ever worked? (If so, what has changed?)
What are you trying to accomplish? (Your ultimate goal, not current technical obstacle.)
What did you do? (Be specific enough to be reproducible - copy and paste exact commands you run, exact output messages, scripts, inputs, etc.)
What do you need? Do you need a complete solution, pointers to get started, or should we say if it will take too long and we recommend you think of other solutions first?
If you don’t know something, it’s OK, just explain the best you can and we’ll go from there! You can also chat with us to brainstorm about issues in general, which helps to figure out these questions. A much more detailed guide is available from Sigma2 documentation.
We don’t need a long story in the first message - we’ll ask for more later. Try to cover these points, and we are happy to get your message.
Aalto Scientific Computing
Aalto Scientific Computing (Science-IT) is focused on all aspects of computing and data, and mostly consist of PhD-level researchers so we can understand what you are doing, too. Our main focus areas are high-performance computing (Triton), research software (RSEs), data, and training training.
Problems with Triton, using Triton
Help with software on Triton
Data advice, FAIR data, confidential data, data organization
Suggestions on tools and workflows to use
General research software and research tools
Advice on other Aalto services
Advice on using CSC services
Triton Accounts (by email)
Increasing quotas, requesting group storage space (by email)
Scicomp garage
Link
https://aalto.zoom.us/j/61322268370, every workday at 13:00
Planned disruptions
There are no current planned disruptions in the daily garage.
If you need more help than the issue trackers, this is the place to be. It’s not just Triton, but all aspects of scientific computing.
Come if you want to:
Solve problems
Discuss and figure out what your problem really is
Brainstorm the best strategy are for your problems
Work with someone on your issues in real time
Network with others who are doing similar work and learn something new
What kind of issues can we help with:
Code and Software:
Issues with your code or software tools you use (e.g. debugging, setting up software, linking libraries)
Code parallelization
Code versioning, git, testing
Data Management:
Data management plans, data sharing
Handling of sensitive data and general legal and ethical (to some extent) questions about research data
Workflows for big datasets
Data versioning
Triton cluster:
Slurm job submissions
Cluster usage
Script setup
Module management / Library loading
General:
Basic methodological or statistical issues
Notes:
All garages are designed for researchers and staff working in Aalto (or those who have a need to contact us).
You don’t have to have a specific question, you can come by just to chat, listen, or figure out if you should have a question.
You can also chat with us any other time (no promises on reply time, though).
Triton, SciComp, RSE, and CS
Link
https://aalto.zoom.us/j/61322268370, every workday at 13:00
You can meet us online, every workday, at 13:00, online via zoom. Imagine this like walking into our office to ask for help. Even if you are not sure whether we can help you, come and chat with us anyway and we can figure it out.
This doesn’t replace email or the Triton issue tracker for clearly-defined tasks. Garage is good for discussion, brainstorming, and deciding the best path. If in doubt, come to garage and we will help you decide. Many people make an issue, then come to garage to discuss.
Try to arrive between 13:00 - 13:15. We may leave early if there is no one around. Please don’t arrive early since we have other meetings then.
We have some special days (see list below) to ask about specific topics, but in reality we can answer any question any day.
Join on Zoom via https://aalto.zoom.us/j/61322268370 .
NBE/PHYS
PHYS, NBE, and ITS (Aalto IT Services) staff are part of the Garage sessions every Monday and Wednesday. Regular reminders are sent to the department personnel lists.
Special days
Some days are special, and have extra staff about certain topics. But you can always visit on any day and ask any question, and we can usually give a good answer (especially about Triton, HPC, computing, software, and data).
Mondays also have NBE/PHYS IT present.
Tuesdays We are continuing the COMSOL Multiphysics focus days in Spring 2024: someone from COMSOL (the company) plans to join our zoom garage at 13:00 on the following Tuesdays: 2024-01-23, 2024-02-27, 2024-03-26, 2024-04-23, 2024-05-28.
Wednesdays also have NBE/PHYS IT present. We also have more staff to help jupyter.cs instructors/TAs.
Thursdays
Fridays also have CS IT present (at the beginning).
Others
Aalto IT services runs something similar for some other schools and departments.
In person
In-person garages haven’t been held since early 2020 for the obvious reason. The online garage above is more frequent and you are more likely to meet the very best person for your topic.
Past events
Scicomp Garage has existed since Spring 2017. It has been online since March 2020, and daily since summer 2020.
SciComp community
Let’s face it: we learn more from each other than from classes. There is a major problem with inequality in computational sciences, a large part is related to how we learn these tools. Join the Aalto Scientific Computing community to help you and others be the best scientist you can be. You can
Network with others within a supportive mentoring community.
Share knowledge among ourselves, avoid wasting time on things where someone knows the answer.
Take part in developing our services - basically, be a voice of the users.
SciComp Garage and issues
Currently, most of our interaction happens in the daily SciComp Garage, which is a daily meeting where we help others (and learn ourselves). If you hang out there, you will learn a lot.
If you subscribe to the Triton issue tracker, you will see a lot of questions and answers, and thus learn a lot.
Aalto community chats
We have weekly chats for the Aalto scientific computing poweruser/RSEs as a way to network with the community and Aalto staff. Currently, these are done at 10:00 on Thursdays as part of the Nordic-RSE Finland chats. Anyone is welcome to join and discuss Aalto-related topics.
Mailing lists
If you have a Triton account, you are on the triton-users mailing list already. Many training and other events are announced there.
If you do not have a Triton account, the scicomp-announcements mailing list provides the same information. Subscribe here.
Join our Research Software Engineer mailing list for information on research software related topics and the RSE community at Aalto, possibly including discussion and internal job advertisements.
Chat
Join the Aalto Scientific Computing group on Aalto Microsoft Teams. The invite code is
e50tyij
. In practice, we watch it for questions but it’s not the most active place (but it could be).We often hang out on the CodeRefinery chat, and there is an
#aalto
stream there.
User groups
Often, there is specialized software or problem domains which need more advanced documentation than the generic HPC talks. Often, the SciComp staff aren’t experts in this particular domain, so we can’t provide immediate help without knowing more. For this, we have user groups: we meet with groups of users to discuss problems and create solutions/documentation about them..
Existing user groups
To be formed.
If you would like to create a user group, let us know. The hardest part is finding the users, so if you form the group of people and schedule a time, it is very easy for us to come. To be clear, if you bring people together and want to organize the group, we are very happy and will take part and make it “official”.
User group meetings
A user group meets periodically, and does various things. At the meeting is some SciComp staff as well as interested users, who want to make a larger change than just solving their own problems.
See examples of the software or problem in practice.
Discuss the best solution of problems
Collaboratively create documentation on the problem (which can be put straight at scicomp.aalto.fi, for example in Applications: General info). We can create video demos, examples, and more.
Discuss how the infrastructure needs to be adapted to the actual use cases.
Provide a network for informal support within research groups.
Preparing for a user group
We will create a Triton issue about it and use that for communication. Subscribe (= turn on notifications or comment) to the issue to get emails about it.
Please submit some examples to the issue tracker, for example either things which already work (discuss + document) or things that don’t yet (we will work together to improve + document). This will form the main part of the meeting. We need examples!
Group meetings
This page applies to these departments so far: CS, NBE, PHYS (if others want to join, let us know).
We would like to meet with each research group once a year. This isn’t to advertise stuff to you, but to hear what you all need but can’t get, so that we can help you with that. A group meeting consists of your group plus other technical services staff (Science-IT, CS-IT, etc.) which are relevant for your group’s work. Hopefully, we can immediately solve some of your major problems. Your group will come away better able to use the best possible services, and we will come away knowing what to focus on in the next year.
Practical matters
Ideally, someone (Science-IT, CS-IT, etc.) contacts your group leader to arrange a time. On the other hand, contact your most local (department) anytime to arrange a group meeting - we are always happy for an eager audience. Your local support will request all the other relevant parties to be there.
The group meeting would happen whenever is most convenient for you - for example, during your regular group meetings. Please propose the best times for you. One hour is sufficient.
You don’t need any particular preparation. If you do anything, think about what computational/data/software tools you use and what problems you have - you could have one or a few people tell about the typical workflows of the group.
Who we are, what we do
We are Technical services (in particular ones focused on computing). See the rest of scicomp.aalto.fi for the types of things we support. Welcome, researchers! tells our most important services for you (+ the most important ones by others at Aalto).
Also at Aalto, you also have these other major service units which are relevant to you (this meeting isn’t mainly about them, but we have inside knowledge of IT Services so can help there):
IT Services (ITS): General mass-consumption IT services for all Aalto.
Research services: applying for grants, administrating, legal, etc.
Learning services: teaching
Communication services
Finance
HR
Topics
Reminder of services available at Aalto and your department (short)
News: Latest changes or improvements (short)
Stories from the field: how do you do your work?
Feedback: How do you do your work now? What works well? What doesn’t work well? What do you need in the future? Tell us all your complaints, because we can’t work on the right things without them. (long)
News / topical items, 2022
GPUs: limited numbers, future procurement, using more efficiently.
RSE service, where and how to use.
Have you seen our latest teaching.
Discussion starters

The types of research service needs you may have, sorted into different levels of concern. Source
Data
Data-driven research: need more support?
Department (project, archive), Triton (scratch), cloud, any other needs?
Management: collection, storage, transfer, archive, sharing.
What do you usually use?
Sensitive data: support and storage locations
Computing
Cloud vs shared workstations vs personal workstations vs laptops
Desktops, laptops
Scientific computing
GPUs
Containers for difficult to run software (docker, singularity, etc)
Virtual machines
CSC (supercomputers, cloud, data, collaboration between universities in Finland)
Usability and accessibility (user interfaces)
Virtual desktops, VDI
Jupyter (jupyter.triton>)
Other (Open OnDemand, …)
Usability and accessibility in general in the modern world
Teaching
Learning Services
Online solutions on cloud platforms (local solutions, VMs, Azure)
Chat: Zulip, Teams, Slack, …
Software
Installation problems
Reusing old software
Support
Support channels
Daily SciComp garage - every workday, 13:00, online.
Chat
Software development: (tools, best practices, collaboration)
How to more closely support teaching/research
General services
WWW servers
CSC services
Email
Printing
Technical procurement
Open Science / Open Data / Open Access
See also
Website
Search this website for help. For that matter, also search the internet in usual. This is usually a good place to start, but often you need to move on to the next steps.
Triton Issue tracker
The Triton issue tracker, which is where all Triton issues should go. Log in and search the issue tracker for related issues, you may find the solution already
If you issue is about or related to Triton this is where it should go.
Garage
Link
https://aalto.zoom.us/j/61322268370, every day at 13:00
Daily SciComp Garage sessions, where you can informally chat. This is especially useful when your question is not yet fully defined, or you think that demonstrating the problem for immediate feedback is useful.
Chat
Chat can be a great way to quickly talk with others, share tips, and quickly get feedback on if a problem is large or small, something to get help with or figure out yourself, etc. For longer solutions, we will direct you to the issue trackers but it rarely hurts to do a real-time discussion. (For real-time video chat with screen sharing, come to the garage above).
The SciComp Zulipchat, scicomp.zulip.cs.aalto.fi is where we most often hang
out. You can ask triton questions in #triton
, general questions
in #general
, research software engineering questions in #rse
,
etc. The main point of Zulip is topics, which allow you to
name threads and easily follow old information. (use zulip in your
courses)
You can also chat with us on Aalto Microsoft Teams.
The invite code is e50tyij
. Our staff also hang out on other
department chats.
Research Software Engineer service
Sometimes, a problem goes beyond “Triton support” and becomes “scientific computing support”. Our Research Software Engineers are perfect for these kinds of problems: they can program with you, set up your workflow, or even handle all the technical problems for you. Contact via the other contact methods on this page, especially via the garage.
Email
scicomp at aalto.fi. Use this only for things related to your account (requesting a Triton account), quota, etc. - most other things go to the tracker above.
rse-group at aalto.fi: Research software engineering service requests. (it’s usually better to drop by SciComp garage since we usually need to discuss more.)
Department IT
CS, NBE, and PHYS have their own IT groups (among others, but those are the Science-IT departments with the most support). They handle local matters and can reliably direct you to the right resources. Department IT handles:
Computers, laptops, personal devices
Department data storage spaces
Other department-managed tools and services
Reach them by department-specific email addresses
NBE and PHYS IT use the same email issue tracker (esupport) as Aalto IT, so issues can be exchanged no matter which address you send an issue to. CS uses a different one, so you have to think a bit more before sending something.
Community
In addition to formal support, there is are informal activities, too:
The daily SciComp Garage, designed to provide one-on-one help, but we invite anyone to come, hang out in the main room, and network with us. This is for basic help and brainstorming.
Subscribe to notifications from the Triton issue tracker even if you don’t post there. You will learn a lot.
Sign up for the Research software engineers and powerusers mailing list and learn about more events that interest you. This isn’t the place to ask for basic help, but if you hang out here you will learn a lot.
Other groups at Aalto
servicedesk, Aalto IT
servicedesk()aalto.fi is the general IT Services service desk. They can handle things with account, devices, and so on. They have a wide range of responsibilities, but don’t always know about local resources that may be more appropriate for your needs. There is an “IT Services for Research” group which focuses on research needs
For students (who aren’t also researchers), this is always your first point of contact - in addition to your teacher.
servicedesk handles:
Aalto accounts, passwords (including Triton passwords)
University-wide data storage (work, teamwork, home directories)
All university-wide common IT infrastructure: wifi, network, devices, websites, learning platforms, etc.
Anything department stuff, when you are not in a department with local IT staff.
Reach them by:
Browse the IT Services for Research list
Research services
Aalto Research Services function more as project administrative services rather than close research support. They provide important help for:
Data management plans for funding applications, other openscience-level data-related questions, and Open Science (contact researchdata@aalto.fi)
Legal or ethical advice, making contracts and NDAs.
Library services
Applying for funding and administering it.
In many cases, you can chat with Aalto Scientific Computing and we can give some initial practical advice and direct you to the right Research Services resources.
Reach research services by:
Contacting service email addresses at the link above
Contacting school representatives findable at the link above
researchdata@aalto.fi for data-related things
About us
Aalto Scientific Computing isn’t a HPC center - we provide HPC services, but our goal is to support scientific computing no matter what resources you need. Computing is hard, and we know that support is even more important than the infrastructure. If you are a unit at Aalto University, you can join us. [Mastodon, Twitter]
About
Computational research is one of the focus areas in Aalto University, and Aalto Scientific Computing makes that possible.
The Science-IT project was founded in 2009 (with roots going back much further) and has since expanded from high-performance computing services to a complete package: we provide computation, data management, software, and training. Our partnerships with departments and central IT services allow a streamlined experience from personal devices to the largest clusters.
To reflect our expanded services, we have rebranded to Aalto Scientific Computing to reflect our greater mission and partners.
Many Centres of Excellence and departments at Aalto University are using our resources with great success. There are currently over 1000 user accounts from all six different schools and at least 14 different departments using our resources. Science-IT is administered from the School of Science with additional university-level funding - our HPC services are available to all Aalto University, free of charge.
Boilerplate text for grant proposals
Below are various texts which describe Aalto Science IT, Aalto ITS, and CSC resources, suitable for inclusion in grant applications an the like. There are various types suitable for different purposes.
If you create your own texts and would like to share them, send them to us.
Warning
These texts are starting points, not something that should be included as-is. The texts need to be adapted and tailored to fit your particular proposal - if you need help with proposal writing you can contact the Grant Writer or Research Liaison Officer of your School for advice (contact information is available here).
Focus on Triton
Computing and modelling are strategic areas of Aalto University. To support research in these scopes the university is committed to provide proper hardware resources and supporting personnel on long term basis. Currently Aalto Science-IT provides a system with about 10000 computing cores. The System also contains over 200 NVIDIA cards for GPU computing and over 5 PB of fast storage capacity suitable for Big Data needs. All parts are connected with a fast Infiniband network to support parallel computing and fast data access. To keep the resources competitive Aalto Science-IT annually upgrades the system based on the needs of researchers.
All resources are integrated with the national resources allowing easy migration to even larger resources when necessary. These include e.g. University dedicated OpenStack based cloud resources and access to thousands of servers via the national computing grid. Furthermore Aalto Science-IT provides much preconfigured software and hands on support to make the usage for researchers as effective as possible. On the personnel side Science-IT has ten permanent Ph.D. level staff to keep the system running and providing teaching and consultation for researchers.
Acknowledging Triton in publications
Remember you need to acknowledge Aalto Science-IT in your papers if you use Triton and its scratch filesystem. See the acknowledging Triton page for instructions on how to do that and some boilerplate text.
Focus on data
Computing and data are strategic areas in Aalto University.
The university provides data management and computing solutions throughout the data lifecycle. The university provides free storage to researchers of essentially unlimited size, provided that the data is managed well. Data storage includes 5PB of high-performance non-backed-up Lustre filesystem space connected directly to the Triton computing cluster for efficient and secure analysis, and 1PB of reliable backed-up storage space for longer-term storage. Expert staff, both technical and administrative, provide advice and hands-on support in data storage, computation, FAIR principles, data management planning, as well as computation.
Data management is designed with a focus on security. Recommended storage locations are centrally located for security. Computing nodes and Lustre data storage servers are physically located at CSC, Keilaranta 14, Espoo. The server room is certified security level 3 (VAHTI-3) i.e. only authorized personnel with clearance are given access to it and there is continuous camera surveillance. All data is access controlled by passwords and individual-level authorization, and firewalled to university networks.
Aalto ITS data storage is directly integrated into Aalto’s sustainable computing environment. Storage is double-redundant and includes the possibility to roll back to previous points in time, with disaster recovery management. In addition to confidential data processing, there are multiple encrypted and/or audited storage environments for sensitive data processing. For IoT, Aalto ITS utilizes public cloud computing providers for case-specific construction of services. Aalto has IT infrastructure personnel, who can help researchers with building the relevant solution for the use case.
Focus on sensitive data
Aalto university provides secure solutions for data management and computing throughout the data lifecycle. The university has an Information Security Management System (ISMS) in place, adapted from the ISO 27001 standard. These processes govern how all our IT systems are being acquired, developed, implemented, operated and maintained. Based on the information classification, we use only selected systems that comply with high security requirements and have been approved for use with sensitive data.
We use encryption technologies to safeguard sensitive data in transit and ensure secure collaboration. Our secure network storage is encrypted at rest, includes the possibility to roll back to previous points in time, and supports encrypted backups for disaster recovery.
We operate a dedicated secure computing environment SECDATA to enable research with most sensitive data. The environment has been audited to comply with the Act on the Secondary Use of Health and Social Data and Findata requirements. Each research project will get a separate virtual desktop environment with customized amounts of memory, disk space, and computing power with a possibility to use GPUs for computational tasks. To safeguard data, transfers are limited and done only through specific audited process and the environment is disconnected from the public internet.
Our technical, administrative, and legal experts provide advice and hands-on support for handling sensitive data. The Aalto Research Software Engineer (RSE) team and Data Agents help with essential privacy techniques such as minimization, pseudonymization, and anonymization. Aalto’s Data Protection Officer provides guidance and oversight on the processing of data and ensuring privacy.
Confidential data (shorter, for CS)
Aalto CS provides secure data storage for confidential data. This data is stored centrally in protected datacenters and is managed by dedicated staff. All access is through individual Aalto accounts, and all data is stored in group-specific directories with per-person access control. Access rights via groups is managed by IT, but data access is only provided upon request of the data owner. All data is made available only through secure, encrypted, and password-protected systems: it is impossible for any person to get data access without a currently active user account, password, and group access rights. Backups are made and also kept confidential. All data is securely deleted at the end of life. CS-IT provides training and consulting for confidential data management.
Focus on connectivity
Aalto researchers can use the Low Power Wide Area Network (LoRaWAN), a data network for Internet of things (IoT) devices with nationwide coverage, free of charge. Using this network, a device can send a small amount of data with minimal power which makes batteries last long. LoRaWAN is suitable for static and mobile sensors that are operated by batteries. Aalto IT services provide support and configure the network together with the user. In Finland, public mobile networks support also NB-IoT (Narrowband IoT) technology.
Aalto campus area has a specific research environment for 5G connectivity, that can be used for developing and testing 5G technology and applications. On the campus area connectivity is ensured via a 100 Gbit/s fault-tolerant internet connection, 1 – 10 Gbit/s connections to workstations and servers, and extensive wireless coverage. Secure connectivity outside Aalto-campus is also possible by various technologies, e.g. VPN.
Research environment: research software engineers
The Aalto Research Software Engineer (RSE) team provides a specialized advice and service in research software, data, and computing so that any researcher can accomplish the best science without being held back by technological problems. Typical tasks including implementing a method bettor or faster than could otherwise be done, or ensuring that results are as open and reusable as possible so that the full impact of the work can be realized. RSE staff are professional researchers with years of experience in computational sciences, and work seamlessly with the rest of the Science-IT team. For the School of Science, basic services are included as part of overheads, or longer-term services can be funded from specific research projects.
Research software engineering services
See also
(this text must be tuned to your grant, replace the parts in CAPITAL LETTERS)
This grant will make use of the Aalto Research Software Engineer program to hire high-quality TOPIC specialists. This program provides PhD-level personnel to work on THINGS, which allows the other staff on this project to focus on YYY. Research software engineers do not need to be independently recruited, and are available for consultation also before and after the project. This service is provided by Aalto Scientific Computing, which also provides high-performance computing resources for your project. The Research Software Engineering service is integrated into computing services as a consistent package.
(for basic service, for now only SCI) The service is available as a basic consulting service for free.
(for paid services) This project receives dedicated service from the Research Software Engineering group, funded as researcher salary from this grant. During this period, one of the Aalto research software engineers joins this project as a researcher, equal to all other project employees.
Other computing and IT solutions
Please note that the boilerplate texts for the computing solutions listed below are not about the Aalto Triton HPC cluster. Please familiarize with the Aalto cloud computing services and CSC services before you include them in your grant application. Please also refer to their terms of service and pricing if you need to mention these in your application.
Focus on cloud computing
Aalto University has agreements with major public cloud service providers (e.g. Microsoft Azure, Google Cloud Platform and Amazon Web Services), and the platforms have been integrated into the Aalto digital environment in a secure and well-governed manner. The platforms provide scalable, collaborative, and integrated computing tooling with software for rapid iteration on data using for example machine learning or access to ready-made AI API’s for [YOUR TOPIC / IMAGE DETECTION / TEXT ANALYSES].
Aalto has private and secure network connectivity between on-premises environment and the cloud platforms, and access is managed through a central identity management system. Expert staff provide solution consultation and hands-on support for end-user needs.
Focus on CSC
Aalto researchers have access to services from the Finnish IT Center for Science (CSC), a government owned center which provides internationally high-quality ICT expert services. These services include multiple use-case specific components – such as containers, databases, HPC and machine-learning utilities - for storing and processing data. The CSC and Aalto services are connected through a high-speed Funet network (Finnish University and Research Network). The CSC coordinates the Finnish Grid and Cloud Infrastructure and has the largest known clusters in Finland.
CSC’s data center in Kajaani, Finland houses the pan-European pre-exascale supercomputer LUMI. This is one of the most eco-efficient data centers in the world. LUMI is using 100% hydro powered energy. The waste heat of LUMI will produce 20 percent of the district heat of the area and reduce the city’s annual carbon footprint by 12,400 tons. Further info at https://www.lumi-supercomputer.eu/sustainable-future/.
Focus on IT solution for remote and hybrid work
Aalto University provides IT solutions for remote and hybrid working. Secure digital workspaces for remote working are created through virtual and remote desktop infra and cloud tools, as well as online support and secure use of one’s own devices and applications. Aalto campus has specially designed (class)rooms with integrated and automated audiovisual technologies in support of hybrid meetings and teaching.
See also
Usage model and joining
Aalto Scientific Computing operates with a community stakeholder model and is administered by the School of Science. Schools, departments, and other units join and contribute resources to get a fair-share of the output. There are two different components to join:
HPC: Science-IT. Get a share of computing resources via the Triton computing cluster.
Aalto Research Software Engineers (RSE): Support of the RSE program provides intensive hands-on support and service for research software development.
For everyone
Aalto Scientific Computing gets university-level support already, so our computing resources are usable by anyone doing research at Aalto (with a limited share). By joining further, a unit gets something even more valuable: time. Our support for using our infrastructure is concentrated for member departments which provide joint staff with us or support the RSE program, in addition to a greater share of resources.
Staff network
There is no Aalto Scientific Computing, just people who want to make computing better.
You might be a department IT staff member, a lab engineer, a skilled postdoc or a doctoral candidates who helps other researchers with their technical/computational challenges. Why not joining forces and join our network of specialists? There is no “Aalto Scientific Computing” on paper, only different teams that work together to help researchers better than they could alone. We invite interested staff to join our community, help sessions, infrastructure development, etc. This program is just being developed (as of 2020), but it roughly includes:
Participation in admin meetings to help us develop infrastructure (e.g. Triton) in the best way for your users
Teaching, for example ensure our classes are suitable to your audience, teach your own classes with our help via CodeRefinery, or directly help us teach.
Co-maintenance of infrastructure (for example, your unit’s special software) on Triton and in out automated software deployment systems.
Learn how to solve your users’ problems more efficiently.
Networking and continual professional development
This is not just for IT support or administrative support, but high-quality research support that connects all aspects of modern work.
This does not replace local support, it just makes it more powerful.
Todo
How to take part.
Triton: computing and data storage resources
Triton is the Aalto computing cluster, for computationally and data-intensive research. Users from members of the community are allocated resources using a fair-share algorithm that guarantees a level of resources at least proportional to the stake, without the need for individual users to engage in separate application processes and billing.
Each participating department/unit funds a fraction of costs and is given an agreed share of resources. These discussions are carried out with the board of the Science-IT project. Based on this agreed share, units cover the running expenses of the project. There is also direct Aalto funding, which allows the entire Aalto community to access a share of Triton for free.
However, computing is not just hardware: support and training is just as critical. To provide support, each unit that is a full member of Science-IT is required to nominate a local support contact as their first contact point. Our staff tries to provide scientific computing support to units without a support contact on a best-effort basis (currently, that effort is good), but we must assume a basic level of knowledge and attendance at our training courses.
Interested parties may open discussion with Science-IT at any time. Using our standing procurement contracts, parties may order hardware to be integrated into our cluster with dedicated or priority access (or standalone usage), allowing you to take advantage of our extensive software stack and management expertise, with varying levels of dedicated access: a share of total compute time, partitions with priority access, private interactive nodes, and so on. Please contact us for details.
Scientific software: research software engineers
The Research Software Engineer program provides specialists in software and data, who can be contracted out to projects to provide close support. The goal is not just to perform a service, but to teach by hands-on mentoring.
For projects, the principle is that the project pays for help lasting more than a few hours or days. This can seamlessly come from project money as a researcher salary.
Units (departments, schools) can also join to get a basic service - their members can receive short-term support without any billing needed. Their members will also receive priority for the project services.
For more information, see the RSE for units page.
Contact
Let Mikko Hakala know about Science-IT related joining, Richard Darst know about the RSE program or SciComp community, or contact us at our scicomp ↔ aalto.fi email address.
What we do
We don’t just provide computing hardware, but a complete package of infrastructure, training, and hands-on support. All of these three activities feed back into each other to improve the whole ecosystem.

We provide many types of services:

Our components, partners, and collaborators
Aalto Scientific Computing serves as a hub of computational science at Aalto. We guide researchers to the right service, regardless of who is providing it.
Science-IT serves as the coordinator, and runs the Triton cluster, the physical hub of large scale computational and data-intensive research at Aalto. As such, we maintain many active collaborations which allow us to guide researchers to the right resource, regardless of who provides it.
Science-IT
Science-IT (Aalto HPC)
Science-IT is the formal name of the project which provides the Triton computational cluster. It is funded by Aalto University, departments and schools, and the Academy of Finland. Perhaps a better description would be Aalto HPC (high-performance computing).
Science-IT is the “legal representation” of Aalto Scientific Computing within Aalto.
Computational research is one of the focus areas in Aalto University. The Science-IT project was founded in 2009 to facilitate the computational infrastructure needed in top-tier scientific research. Many Centres of Excellence and departments at Aalto University are using our resources with great success. There are many. Science-IT is administered from the School of Science, and direct Aalto level funding enables use of our resources from all Aalto University, free of charge.
Our services
In Science-IT, we concentrate on mid-range computing and special resources needed by researchers in the School of Science. With local resources, we can provide high-quality support and even research-project-level customization. Because our resources are integrated into the Aalto IT environment, with regular local training in the scientific computing practice to entry-level users, our resources enjoy an ease of access and lower barrier to entry than, for example, CSC HPC resources. We are also a basic research infrastructure, enabling the integration of separately purchased resources to our cluster and storage environments, with dedicated access for the purchaser.
Membership
Departments and schools can join the Science-IT project and receive a share of our resources and dedicated staff support. Please contact Mikko Hakala for details.
Science-IT Management
Science-IT is managed by the board: prof. Harri Lähdesmäki (head), prof. Adam Foster, prof. Mikko Kurimo, prof. Petteri Kaski.
Operational team: Mikko Hakala, D.Sc. (Tech), Ivan Degtyarenko, D.Sc. (Tech), Richard Darst (Ph.D.), Simo Tuomisto (M.Sc), Enrico Glerean (Ph.D).
To get additional information or how to get involved please contact one of the board member above (firstname.lastname@aalto.fi).
Science-IT is the organizational manifestation of Aalto Scientific Computing.
Science-IT concentrates on mid-range computing and special resources needed by researchers in the School of Science. With local resources, we can provide high-quality support and even research-project-level customization. Because our resources are integrated into the Aalto IT environment, with regular local training in the scientific computing practice to entry-level users, our resources enjoy an ease of access and lower barrier to entry than, for example, CSC HPC resources. We are also a basic research infrastructure, enabling the integration of separately purchased resources to our cluster and storage environments, with dedicated access for the purchaser.
Our team is mainly known for providing the Triton cluster, a mid-range HPC cluster with ~10000 CPUs, 5PB storage capacity, Infiniband network, and ~150 NVIDIA GPUs for deep learning and artificial intelligence research. We provide a Jupyter Notebook based interface to enable light computing with less initial knowledge required to make our services easily accessible to everyone. Our team also works with the CS, NBE, and PHYS departments to provide data storage and a seamless computational research experience. We maintain http://scicomp.aalto.fi, the central hub for scientific computing instructions and have a continuous training program, Scientific Computing in Practice.
Computer Science, Physics, and Neuroscience and Biomedical Engineering
These departments are members of Science-IT, and their local IT staff provide a great deal of scientific computing support, and in fact all the Science-IT team above is contained here. These departments resources are seamlessly integrated with Aalto’s HPC resources.
Computer Science IT
Computer Science IT provides advanced computing, data, and IT services to the Department of Computer Science. Ten years ago, we focused on daily infrastructure and devices. We still do that, but our we now serve a far broader mission including teaching and services, data management, specialised research tools, and cloud services.
Our services
We:
Handle daily device and infrastructure needs.
Develop and maintain department services, such as jupyter.cs.aalto.fi or the department services database lapa.aalto.fi.
Help co-maintain other platforms developed by researchers or teachers.
Provide services for managing the department’s research data.
Provide advanced consultation for IT needs for research.
… but most basic IT tools are handled by Aalto IT Services, not us. We build on their work and make sure research and teaching goes as quickly as possible.
(also note, we don’t primarily serve CS undergraduate students)
Work for CS-IT
We are always looking for students interested in IT, programming, and system administration. We also are a good place for civil service. The most important prerequisites are a good understanding of Linux and a never-ending desire to learn more. Buzzwords you are likely to become familiar with/useful skills to have:
Kubernetes, docker, and virtual machines
Web service development
Puppet (and Ansible)
Data and storage systems
Computer hardware, building high-performance workstations
Contact
You can always drop by room A243 if we are there (not during covid-19, please) or join the daily online garage, or contact us by the email address findable on our internal wiki.
See our members on the About Aalto Scientific Computing page.
Partners
We are a leading member of the Finnish Grid and Cloud Infrastructure (FGCI), a university consortium to support mid-range computing in universities. FGCI, via Academy of Finland research infrastructure grants, funds a large portion of our work. Thus, we maintain ties to most other universities in Finland as well as CSC, the national academic computing center. Through the FGCI, we provide grid computing access across all of Finland and Europe.
Our team overlaps with the Departments of Computer Science, Neuroscience and Biomedical Engineering, and Applied Physics. The IT groups in these departments provide advanced Triton support.
We maintain close collaboration with Aalto University IT Services (ITS). We are not a part of ITS, but work closely with them as the computational arm of IT Services. ITS provides the base which we repackage and build on for many of our services.
Our team maintains ties to Aalto Research and Innovation Services to guide data and research policy. Triton is an Aalto-level research infrastructure. Our staff is involved in research policy making, including ethical, data security, and data management. Our team contains several Aalto Data Agents.
We partner with CodeRefinery, a Nordic consortium to assist in training of scientists, to provide training and support computational competence.
Who we are
This table lists people supporting Scientific Computing at Aalto University who considers themselves a part of ASC. If you want to be added here, let us know. We welcome all contributors. There is no Aalto Scientific Computing, just people who want to make computing better.
Affiliations |
Specialties |
|
Science-IT, CS-IT, Data Agents, Aalto RSE |
Data science, Triton, teaching, usability |
|
Ivan Tervanto |
Science-IT, PHYS-IT |
Triton, HPC hardware, HPC OS, teaching |
Enrico Glerean |
Science-IT, NBE, Data Agents, Aalto ethics committee |
Triton, ethics and personal data, data. |
Science-IT, PHYS-IT |
Triton, HPC OS, parallel software |
|
Jarno Rantaharju |
Science-IT, Aalto RSE |
Software Development, HPC software and optimization, profiling |
Thomas Pfau |
Science-IT, Aalto RSE |
Software development, Matlab, Linear/mixed integer programming, Constraint based metabolic modelling |
Simo Tuomisto |
Science-IT, CS-IT |
Software Development, HPC software design and optimization, GPU computing |
Mikko Hakala |
Science-IT, CS-IT, NBE-IT |
Triton, data storage systems, HPC administration |
Scientific outputs
Most of the computationally-intensive research outputs from our member departments use our resources. In addition, at least the CS and NBE departments use our data storage for most big data projects. You may view the our research results using research.aalto.fi (Science-IT infrastructure section).
Current research areas
Our users come from countless research areas:
Method development
Computational materials research
Network research
Neuroscience
Data mining
Deep learning and artificial intelligence
Big data analysis
FCCI Tech Seminar series
We have an occasional seminar series, open to all, on how we run our group, FCCI Tech. Our archive may be interesting to other scientific computing teams and research software engineers.
FCCI Tech (fka Behind Triton)
This is a series of talks about scientific computing support and HPC infrastructure administration in practice. It started as our internal kickstart to new members of our staff, but the scope is expanded and now others interested in research infrastructure is invited, though our orientation is still primarily on our own team. Typical attendee are computational research engineers, scientific computing support, or HPC cluster/SciComp admins.
In the future, this may turn into a more general “research engineering” seminar series, once we are done with internal explanations. Guest speakers are welcome. The name stands for “Finnish Computing Competence Infrastructure Tech”.
We share what our practices are, what we have learned, and informally discuss.
Practicalities
Time: The next speaker announce the time/date of the seminar the week before. The speaker sends invitation with the Zoom link. Usually Fridays at 10:00 EET.
Duration: Rough estimate: as desired; ~60 minutes time slots; should be plenty of time for questions and discussion.
Location: Zoom, ask for an invitation but it is usually the garage link.
Recordings: You can view a playlist of some videos on youtube (and a few more are available to our team internally).
It is not a right but a privilege to participate. Free.
Past and currently planned
User support
As infrastructure providers, we are often thrust into a user support role (as well as a teaching role). We should look at this as a good thing: support of top-level science requires an intimate connection to the tools to do that science. I see that as part of our plan.
This talk is about Aalto Scientific Computing’s user support. It is designed as much to explain our philosophy of user support as it is to talk about specific tools. It takes a critical view of some existing common practices, as discussed in CodeRefinery/NordicHPC channels.
Broad contents:
What does “user support” even mean?
AaltoSciComp’s lines of user support
Strategic risks and considerations

The three roles of Aalto Scientific Computing are all interdependent on one another.
About us
We are Aalto Scientific Computing - Science-IT (HPC) - Department IT (CS, NBE, PHYS) - Close collaborations with Aalto ITS, CSC, FCCI
Our collaboration used to be called “Finnish Grid and Cloud Infrastructure”, now will be called “Finnish Computing Competence Infrastructure” so user support is clearly more important than ever.
We are proud of our user support, but it is a multi-faceted approach which requires the right mindset.
Role of user support in scientific computing
User support has a bad reputation
Customers often think it is really bad (the support staff hate me!)
Support staff often hate doing it (the customers don’t know anything!)
Our term “issue” or “ticket” implies it’s a discrete task that you want to end as soon as possible.
Why?
Technology is hard
Users usually don’t give enough information to solve the issue.
… Users don’t even know how to give enough information.
We often pick up slack when something isn’t otherwise taught
We are disconnected from the user community
User support may be some forced extra thing on top of our “real” job.
Types of support
How do we even answer questions people may have? Some issues are system bugs that are our action items, but when the user themself needs help we can make some hierarchy of support strategies:
“read the manual: <link>”
tell them what to do
give them a live demo
pair program working example, you lead
do the task for them, no need to teach
Lower letters are faster to answer and traditional support. Higher letters are much more time-consuming, and approach mentoring or Research Software Engineering services.
Why is support hard?
“Crisis of computing”: most users skills are much less than needed.
User interfaces are usually bad
Lots of hidden internal state
XY problem
People ask for what they think they need (X)
They are given X
X isn’t even a good way of doing what they actually want (Y), but we spend a huge amount of time doing X, when the right way Z→Y is much simpler.
XY problem (wikipedia): people don’t ask for the end goal, but some intermediate step.
XY solution (my term): Support person wants to answer X because it requires less investigation and you can close the ticket and move on, even though they get the feeling it’s not a good idea.
Be motivating
“How to help someone use a computer” by Phil Agre: https://www.librarian.net/stax/4965/how-to-help-someone-use-a-computer-by-phil-agre/
Hanlon’s razor: “never attribute to malice that which is adequately explained by stupidity”
In our case, this is never attribute to malice or stupidity that which is adequately explained by having never been told something obvious
Avoid expressing unhappiness, displeasure, a condescending attitude, expectation that they should have known better, “damage”, etc.
Resist the temptation to blame the user. If they actually can do something that harms others, it’s the system’s fault. If they don’t know something, the UI is bad or society’s preparation is not enough. Etc.
SciComp`s user support tools
Our general guidelines
“help page”, scicomp.aalto.fi/help
Describes what to do in general, key points to mention when making a request.
It links to a longer “how to ask for help”
Both can be a bit patronizing to link to during an issue, so we have to be careful.
Docs
https://scicomp.aalto.fi (this site)
Open-source (CC-BY), public
Built with Sphinx
Findable by general web search. This is a big deal - don’t hide your docs!
Managed by git on Github
There will be another talk on specific Sphinx information later.
Gitlab issue tracker
We use Aalto Gitlab (version.aalto.fi) as issue tracker
University single-sign on
“Internal” permissions (anyone who can log in)
Common interface, reasonably powerful labelling, searching, etc.
When is an issue closed? As soon as possible, or when you are sure they are happy?
We are too much “when we are sure they are happy”, which often is “never”
Closing too soon discourages asking for help.
Is issue the right term here, or is conversation the right term?
Email tracker
Email is a bad medium, advanced issues should be public so that users can learn from each other and we don’t have to type the same thing over and over.
Low threshold to direct to the issue tracker instead of email.
Most users know this and we get few emails
Aalto IT services uses Efecte, CS uses its own RT (much nicer).
Three groups: scicomp, scip (teaching), rse-group (RSE services).
Daily Garage
Online “office hours” via Zoom
Every day, 13-14. If no one comes, it’s admin chat time.
Amazingly good for keeping a community going.
Chat
Is chat a good idea or does it get out of hand? Remains to be seen
Current philosophy: we need to build community. Chat is not for issues, but chat and determining if something should be an issue or not.
Uses Aalto-hosted Zulipchat. Believe us, just don’t use Slack.
Office drop-in
Not done in pandemic time, obviously
Mostly replaced by “daily garage” which is better anyway
Our offices are spread around the departments we serve, and we accept drop-ins anytime we are there.
This keeps us closely connected to the community.
Personal networks
Most of us came from the departments we serve now
Our existing networks are a good way of contacting us
Teaching
You can’t just answer questions as they come in, you need to proactively.
Our teaching is open and free.
Low threshold to direct to existing material rather than answering new question. Close support ↔ teaching connection.
CodeRefinery is a Nordic teaching collaboration.
Private email
I (rkdarst) really discourage this and always direct people to one of the tracked means.
My phrasing “If you send it to me personally, I am almost certain to eventually forget to reply, and I may not be the person who can best answer you anyway.” Then I usually try to give some sort of an attempt at an answer, since I have to give the appearance that I really care.
Strategic vision of support
Support ↔ teaching ↔ RSE
Support: one-to-one answering questions
Teaching: one-to-many improving skills
Research Software Engineering: one-to-few “I will do it for you” or “Let me get you started”
Strategic risks
The middle layer of science always gets cut first: when funding goes down, support will get cut and researchers left more alone.
Our load increases, and our funding doesn’t
We become unhappy, support level goes down
Emphasis increases on speed of closing tickets
Strategic benefits of good support
These can be used to argue for good funding of our teams:
Diversity
Open science
Without good user skills, people can’t make their computational work reproducible or shareable.
We need to claim our place in this problem, rather than let it go to administrative Open Science staff.
Exercise: problematic situations
Someone emails you privately about something they have clearly not even tried yet.
A new researcher is trying to use Triton to do some machine learning. They are trying to use Python+Jupyter, but minimal experience managing a Python environment.
Conclusions
Open questions
What do you think?
Do we have too many lines of support?
See also
SciComp’s User help page
Richard Darst’s talk on Support services vs diversity
How to ask for help with supercomputers, the counterpoint of this from the user perspective.
#NordicHPC threads on CodeRefinery chat, which has provided many ideas
How to write good support requests, by Sigma2 (Norway)
Credits
Author/editor: Richard Darst
Thanks to Radovan Bast, Anne Fouilloux, and others in the CodeRefinery NordicHPC channel for good discussions.
Technical documentation with Sphinx
See also
This talk explains how one can use Sphinx for technical documentation, in particular this very site scicomp.aalto.fi. The focus is to make an overview for contributing to this site (or similar ones), but it will also provide a strong basis for creating such a site yourself.
See also
About this site for a quick guide for editing this site.
Basics
scicomp.aalto.fi
Home of Aalto Scientific Computing’s documentation
Before 2017, was Triton’s documentation using Confluence (wiki software)
Now has information on many different topics about scientific computing.
Rather highly ranked in search engines.
Converted from wiki.aalto.fi (Triton) using
_meta/confluence2html.py
and then pandoc to convert HTML→ReST.CC-BY license agreed at that time
Properties of good documentation
Organized, easy to use
Versioned
Anyone can contribute
Shareable, reuseable, licensed
No lock-in, can migrate later
Plain text so 50 years of text processing development (
grep
,sed
), etc all work.Not standalone, can integrate with other materials (e.g.
literalinclude
).git? (naturally comes out of the above)
The basic documentation stack
Git repository
Hosted on Github
Documentation written in ReStructured Text or MyST-Markdown
Built with Sphinx
With various extensions
Hosted on ReadTheDocs
GitHub actions validate basic syntax
Demo: making a change
I want to add the Journal of Open Source Software (JOSS) review checklist (https://joss.readthedocs.io/en/latest/review_checklist.html) to the RSE checklists section (https://scicomp.aalto.fi/rse/#checklists).
Through this, we will see:
Git repository layout
ReStructructured Text format
Sphinx table of contents directives (
toctree
)Creating a pull request with git-pr
Reviewing the pull request
Merging
See the rendered version.
Building the site
It has a
requirements.txt
like a normal Python project.Until recently, was buildable with stock Debian/Ubuntu packages. Now it may require custom extensions.
conf.py
contains all configurationindex.rst
is the root of all docs.Makefile
builds itmake html
to make itmake clean html
to rebuildmake clean check
to build and check for any errorssphinx-autobuild . _build/html/
may be useful - start a web server that automatically reloads on changes.
View results in
_build/
Editing on the web
The Github web interface is suitable for making simple changes.
You can either directly commit or open a PR.
Can we use this more?
Sphinx toctree (table of contents tree)
The
toctree
directive is the fundamental building block of the site.It organizes documents into a tree, and that three is used to make the sidebar. This directive can be put into any page.
Example:
.. toctree:: :maxdepth: 2 aalto/* data/index README
Example: Follow it from
index.rst
→aalto/index.rst
→aalto/jupyterhub.rst
→aalto/jupyterhub-instructors/index.rst
→ various subpages.It makes sense, but for complicated case I often do trial and error.
Arrangement of the site
scicomp.aalto.fi started from the Triton wiki
It then grew top-level sections for Aalto, Triton, Data, Training, RSE, etc.
It is about time that we rethink how it is organized.
rkdarst is currently the one with the overall picture in mind - for consultations about big changes.
Other details
Sphinx
Sphinx is a full-fledged extendable documentation generator
We use many extensions such as
sphinx_gitstamp
,sphinx-{copybutton,tabs,togglebutton}
,sphinx_rtd_theme
.Custom Javascript and CSS in
_static
.Very useful to know for other projects in general
ReStructured Text syntax
Why ReST? Because it’s not a thin mapping on HTML like Markdown.
Markdown is syntactic substitution, ReST is semantic meaning.
MyST is now a reasonable alternative, but it is closer to a different ReST syntax than Markdown.
See syntax quickstart at https://scicomp.aalto.fi/README/
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
Most surprising ReST points:
Double quotes for literals:
Run ``nano`` to begin
(configurable)
Links are scoped:
:doc:`/triton/index` :ref:`tutorials`
(configurable)
Two underscores under links:
The main `Aalto website <https://aalto.fi/>`__
Github Action checks
make clean check
will warn about the same errors that Github will fail on.Github provides error tracking for pushes and pull requests (demo?).
Example failure:
I purposely have checks as rather strict and disabled some options that would allow us to do more flexible ReST: “explicit is better than implicit”.
ReadTheDocs
https://readthedocs.org provides a management interface for the docs
There is a joint aalto-scicomp account to manage it
Demo if time, but pretty much self-explanatory
Occasionally a build fails for no reason and rkdarst needs to go wipe and rebuild, or fix dependency versions.
Little-known features
We could use Markdown or Jupyter
Via MyST-parser or MyST-nb for Jupyter.
They all work together in the same site.
ReST is really nicer for this than shoving directives into Commonmark.
Compatible with many other projects
Standard documentation system for many projects
Used in recent CodeRefinery lessons, for example
Minipres
Turn any site into a presentation
Demo: https://scicomp.aalto.fi/tech/sphinx-docs/?minipres&h=3
Can anyone help do this properly?
Redirect to HTTPS
ReadTheDocs doesn’t natively do this for external domains
Done via Javascript
Can anyone improve?
Other output formats
Sphinx can output to PDF, single-page HTML, epub, manual pages, and more.
Can anyone think of a use for this?
Substitution extension
Written for Hands-on Scientific Computing
sphinx-gitstamp
Bottom of every page lists date that exact page was actually modified.
Open questions
Pull requests or not?
When should we use pull requests? When should we push directly?
In practice both are fine, up to you to decide what you want
rkdarst believes that, if you aren’t sure, push directly and ask for review.
Others at Aalto can use scicomp.aalto.fi
Should we encourage others to join our project here?
Testable docs
Our dream would be to make examples in a testable form, where one can automatically run them all and find errors.
For example, this python-openmp example includes everything needed to submit and run the file.
Can this be automatically tested? A bit too complex for the typical doctest.
Integrated HPC-examples
We have two example locations:
The second (hpc-examples) could be included as a submodule to reduce duplication, and users can also clone it during courses.
Don’t use ReadTheDocs anymore?
Github Actions + GitHub Pages or other hosting sites would work instead of ReadTheDocs now.
How can we keep things up to date?
Requires continuous work, like any docs.
What should the threshold be for removing old material?
We now have a last updated time at the top.
We clearly need to think about this more.
Visitor stats
ReadTheDocs provides limited stats based on web server logs.
rkdarst is against detailed web tracking.
Can we find a way to get both?.
2022 update: we have Plausible analytics which is sufficiently anonymous.
Building a community
How can we get more people to contribute?
Online work and support
See also
Our garage description page for users: Scicomp garage. (This is an internal description page)
Since 2020, Aalto Scientific Computing has worked online. Since we are a distributed team supporting users in many locations, this has improved our work in numerous ways. Online work gives us:
A way to interact regardless of physical distribution close to users.
Higher-quality, continuous interaction (including better onboarding).
Better work-life balance and adaptability to different lifestyles.
Since 2017, we have had weekly office hours called “garage”. Since 2020, they have been online and revolutionized our support. The garage gives us:
A standard way to help users interactively, without the burden of scheduling meetings.
A “social time” in the middle of the workday to chat with each other.
By combining the above two, we can chat about useful things relevant to our work, handle many internal meetings that would otherwise have to be scheduled and allows us to better share knowledge, and in general provide the spontaneous interaction that everyone claims is missing from remote work.
rkdarst’s principles of remote work
The online garage helps with at least two of rkdarst’s principles of remote work, and part of a third:
No private messages (allow others to know what you are doing)
Don’t schedule meetings (use standard meeting locations and talk spontaneously)
Work in public (make it possible for others to join you)
How it works: “garage” support session
We have an announced time: 13:00, every workday.
Users and staff join the meeting during the scheduled time
We don’t promise any service level - some days, it could be that users arrive but there are no staff. But this has become so integral to our work that it never happens.
Users ask their question and we do initial triage
We help either in the main room or breakout rooms. (for example, main room for a question where there is one user and the topic is a good discussion point for everyone on the team)
At least one staff helps. Usually, we try to have two helping: one who knows and one who is learning the topic. (This is very useful “on the job training”.)
Staff don’t have to be always-on. It is usual for many staff to be working on their other work, passively listening in case something interesting comes up or someone says their name.
Staff can also easily be called to the meeting using chat.
Very low threshold for screensharing.
“Remote control” is very useful, and a middle ground between telling people commands to type (extremely slow and demotivating when someone has no idea what to do) and taking over their computer (demotivating/hiding information in another way) since the user can easily and actively see what is going on.
Often in other issues, when the actual problem is unclear, we will say “Let’s talk in garage” rather than try to debug by asynchronous chat. Since garage is so frequent, this feels good.
You can read our “support flowchat” from Help.
Technical setup
There is one recurring Zoom meeting
Meeting schedule = “recurring at no fixed time” option.
Everyone on our team is a co-host (must be in same Zoom organization)
The first co-host to join becomes the meeting host.
Any co-host can open breakout rooms or assign customers to breakout rooms. But for the most part we tell users where to go and they go themselves.
Normally, first person to need breakout rooms opens an excess number, such as 10, and selects “allow participates to choose”, click “open”, and takes no further management action.
Some people may initially use chat to ask their question (the dispatcher can also send these initial questions by chat). This is especially good as a second conversation while one problem is being discussed.
Zoom trolls have never been a problem, even though the link is public. One hypothesis is that by not listing specific dates on the webpage, it is not a findable target by someone looking for “where to troll now?”.
Typical procedures
Usually one person is the effective “dispatcher”: they make sure that everyone is greeted, take a basic description of each problem. They make sure that people are handled, call in the best supporter, etc. (after a team gets enough experience, this role becomes implicit).
How it works: internal meetings
The garage room is actually our only meeting room for all normal team meetings
For example, we have our weekly team meeting right before the garage one day of the week.
This keeps meetings on schedule and provides a day when we can be sure most people are at garage, for the hardest questions
We would enable the waiting room towards the ends of these meetings (but normally we don’t use the waiting room).
Even other meetings, either two people discussing something, happen in this room.
Worst case: meetings overlap or run into each other. But this is actually good, doesn’t everyone complain that you don’t spontaneously meet people online? We split to breakout rooms and manage. Sometimes we have even made important connections this way.
This isn’t just for users - other staff teams can come talk to us during this time. Basically, it replaces a lot of the overhead with any meeting with us.
Online-default meetings is great for work-life balance of people, especially those with families.
Chat or other asynchronous text-based communication is a requirement for inclusive meetings. It allows anyone to contribute ideas without waiting for a pause, and more than makes up for any online awkwardness. (The “meeting agenda” below can also serve this purpose).
Meetings are managed with a Google Docs agenda.
Each week, a new heading is made, and it collects topics for the next meeting. There is no running through a list of ongoing projects and hearing “going on”, every agenda item has been actively placed by someone over the last week, who actively needs thoughts and a decision from the rest of the team.
Someone screenshares the agenda. Instead of needing to find a pause to talk, people can write information/thoughts directly into the agenda, so meetings scale better. People can write information already in advance of the meeting, to focus the meeting on discussion and not sharing information.
Everyone should have the agenda open themselves so they can see, scroll, and contribute - a meeting is no longer just voice talking!
The meeting agenda can also serve as chat - if someone wants to say something but can’t find a time to use voice, they right it there directly as a point.
If you want, you can expect everyone to write down their most important points and summaries of their points directly in the agenda - themselves (instead of delegating that to a designated note-taker). This is more fair, allows everyone to write their notes in their own words, emphasize their most important points (unimportant points not written), and gives others a time to talk.
It is only one running document (not a new one each week). New weeks are added to the top (since top loads first). Attendees can easily scroll down to refer to past weeks.
This strategy has revolutionized our meetings. Other meetings have much more of a “this meeting should have been an email” feeling after this. (In no small part because the “this should have been an email” parts get written and read by everyone, with only a short mention if that’s all it needs).
How it works: general common space
If two people are text-chatting and need to talk in person, there is zero overhead. One simply asks “Zoom now?”, the other confirms, and they know exactly where to go. Or the answer might be “Garage tomorrow?”
This space is also is used for random coffee breaks, etc, which are usually spontaneously announced.
In theory, especially when we are onboarding people, this can be a generic hangout space during downtime. You might meet someone there and chat and learn something.
In short, the meeting is the “commons” of “caves and commons”.
Problems with in-person office hours / garage
People have to bring their own laptop. When someone works on a power desktop, they can’t bring it.
No screen-sharing. People are crowded around one computer looking at it.
You can’t type on their computer without taking it away from them. For screen sharing, if you do “remote control” at least they can clearly see and feel in control.
Really hard to have multiple supporters with one customer.
From your main workspace, you hopefully have multiple screens. One screen can be the screenshare while the other is your own debugging/testing work.
For individual-person office hours, or even an open office policy, someone may come by and the best person to answer may not be there, may be in another building, etc.
Even if they are there, one-on-one support doesn’t give the “on-the-job training” to other team members.
“Open door policy” makes for constant distractions.
In-person garage tends to be limited to once a week, since everyone has to go there. Staff leave their main workspace, so can’t work as efficiently. Online, it is completely reasonable to be working on other work while muted/video off and passively listening in case something useful comes up.
Open questions
What is the largest size team for which this works? What happens when we go over that?
What’s the best frequency? We really think that every day works best for something within a team.
Mixing different teams in general: how different of teams can use the same garage/standard meeting room.
If multiple teams have separate garages, should they be at the same time or different? Combined? (does it get too big?)
Is it even possible for one person to have multiple garages they need to keep in their mind - or is it a “one-per-person” kind of thing?
How many garages can someone attend (as staff) before it becomes “too much”.
Is there a better tech than Zoom? In 2022, it works much better than early 2021, and at least people can join via browser.
When people start working in-office again, how does this continue? (People have started, and Garage seems to be a permanent culture shift. But it helps that our offices are distributed around).
Proposal:
Flip it around: don’t look it as a “how to scale garage to more staff”. Scale communities to the size that can be supported by a garage, then make more communities as needed, each with their own support infrastructure.
So garages contain 5-15 supporters, and the communities perhaps several hundreds. The communities can overlap/be virtual inside of organization units.
The support staff within the garages network between communities on the support/tool side, so that they are aware of the broader environment and can direct the members to other garages as needed.
The future
Coordinated garages across different teams? At the same time or different?
Some sort of cross-organization garage sessions. But, is something only once a week good enough to support continuous work? Does it work as a starting point, then you direct the user to your own specific daily garage?
Recommendations for how to implement your garage
(I’m not sure what to say here, that isn’t already said or implied above. Any ideas?)
See also
Our help page
List of garages
Why the name?
I think it came from another Aalto team that held a “travel garage”. Unsure where they got the name from or if there is a better name.
How to actually respond to user support requests?
I’ve been asked before, “how do you actually respond to customer support requests?”. There are some obvious answers (be polite, try to answer, etc), but are there any specific references for research computing / scientific computing support staff? This page collects my ideas after having done it formally and informally for years.
This page is specifically about making responses respectfully and with compassion for the requestors. It’s not designed to be a big-picture how-to of user support - there are plenty of other resources about that.
Unsorted notes:
one person takes the lead in communication
Start by talking with people about the big picture
their position
past work
what they expect to get out of the support
many questions are actually about:
the environment setup
Why care about how you respond?
An example:
When interviewing people once, we started our interview with to-the-point factual information and questions. Our tone of voice was “bureaucratic”, to say the least. Our interviewees responded in kind: with little enthusiasm and we could wonder if they even wanted the job.
We realized something had to change. Our next interviewees were greeted with enthusiasm and excitement about the job. The interviewees responded likewise, and we could more easily see how someone could perform.
Why is this important? Basically, we feed our users how they will respond. Is computing a chore they hate? Is it something that’s fascinating, even if not their main goal? Do they see working with us as the highlight of their day or a last-resort? We need to set the right tone with our interactions. This is true in all of:
Our answers
Our requests for follow-up information
Outreach about our services
See also: Observer-expectancy effect and Clever Hans.
Levels of competence
Customers have all levels of existing competences and needs. The more you understand of this, the better you can assist - and it is needed to frame any response.
Understand the level that the requestor is at and the level they need to be at. (this is usually not apparent at first)
An answer far below their level is demeaning.
An answer far above their level is demotivating.
It can be hard to know the level to answer, so multiple levels of answer are useful: one general paragraph, then one more detailed paragraph properly connected. This also helps people advance up their level of confidence, but needs more writing.
Aalto SciComps’s Bloom’s taxonomy of scientific computing skills may help to guide your thoughts in evaluating this.
Discuss: Is it better to assume at too low a level or too high? How can we find the right level to answer at?
XY problem
XY problem: someone asks about their attempted solution (Y) and not their root problem (X). If a supporter focuses on the Y and not the X can cause very inefficient answers.
Examples: “How do I turn on the stove?” vs “I am trying to make tea, how do I turn on the stove” which allows the answer to point out that the asker is trying to us an electric kettle on the stove.
Don’t assume that what someone asks for is what they really need - you need to read between the lines.
This isn’t their fault, maybe they don’t know what they need.
Possible mitigations:
When replying, state your assumptions in your response so that they can correct you if they notice it wrong (if this is relevant).
Also consider stating several other possibilities briefly, and when they would be relevant. For example: “Do XXX to install the software. But do you know that you can also load it via the YYY module?”
General guidelines
Think about what the underlying need is (X, not the Y)
Be verbose (or at least not short).
If your answer is “no”, it feels better to say it with many words, rather than few.
Verbosity is a sign of engagement, which makes the customer feel respected no matter if the verbosity is useful to them or not.
Be especially cautious about answers that are just a link to the documentation - unless they are specifically asking for that. Even then, try putting it in context.
Service gesture: something more than people expect (beyond the minimum that they asked). (example: try harder to find someone who can answer, point them to that person.)
Know your audience
The more you know about the very work of the person, the faster and better you can answer questions.
This is a more direct lesson for the people managing support, but can you do anything about it yourself, too?
Consider at what level someone needs support
Do they need single answers to a question?
Are they very lost and need to work with someone to implement it?
If you answer small questions piece-by-piece this is inefficient hill-climbing.
Direct to a RSE service for more support?
Do they need a tutorial, reference, theoretical explanation, or how-to (the 4 types of docs). These are all very different types of answers or links.
Accept that you can’t do everything
Make this decision explicit, not implicit.
An implicit decision here means it is made based on internal biases.
Better to discuss among the team to make sure it is consistent.
Document what you do know and learn while working, even if you don’t have the full answer yet.
Yes, this can be a rather hard thing to do: we don’t want to give a partial or possibly wrong answer.
On the other hand, being silent for days or weeks until you have the proper answer really doesn’t help anyone. With the rate of research, they have probably even gone on to something else!
Consider if you should keep the requestor in the loop (generally yes, probably good, but qualify if something is still in progress and may not work).
This also helps any future staff who may pick up after you. So, even if you don’t document to the requestor, document internally.
Try to avoid long silences before any replies, for example if you don’t even know who can answer. This can be especially hard without a front desk or if you think “just a bit more and we’ll know something”.
Giving bad news
Sometimes you have to say “no”
Again, be more verbose rather than less
Acknowledge the X and the Y of the initial request, so that they know the request really isn’t possible (rather than “you not understanding”).
State why it’s not possible, in more or less words.
Can you turn this into an X-Y answer - find what they really need, that you (or someone) can do?
If you don’t know the answer
Our audience does all kinds of advanced work, so often we don’t know the answer - or don’t know it right away.
Ask to see what they actually do, all error messages, etc. Ask to share screen. This can help you to see some problems, and makes most problems easy.
Request the basic information to “work on it yourself for a bit to save time”, this gives you enough time to study solutions.
Related to the above, take the time to make things reproducible. This is needed for you to begin working, but also seeing the basic steps will help to understand the background.
Dealing with mis-directed issues
It can be frustrating when someone asks the wrong place
If you need to be nicer than just saying “no”, since you have presumably already understood what the issue is, you actually can give useful pointers to where to ask next. This itself may be a useful answer to them.
Can you give keywords / a copy-paste text that explain the actual problem, that they can send to the other support you are now directing them to. This:
Save the other staff time (they don’t have to do the X-Y analysis themselves)
Save the customer time in thinking about what to say
Makes the customer feel valued and validated
Communication strategies
Communicate with respect. Informal is probably OK, but know your audience.
Sarcasm is usually bad (but we should have already know it’s bad online). Even if you think the person reading now will get it, what about all the people in the future who might read and rely on the same answer?
In-person or synchronous support
See the How to help someone use a computer for many ideas that are relevant to in-person support (and more).
When you learn something, do you want to create an issue about it so that the knowledge can be used later?
Try to avoid simply taking over their computer and doing something. On the other hand, dictating something key-by-key can be equally frustrating. Try to let the user do as much as possible and clearly explain why you do some things yourself.
Does saying “I don’t know, so it’s hard for me to tell you what to do. But I can try to figure it out while you watch - is that good?”
Online support allows screen-sharing and remote control, which allows you to type but the other person to still feel like they are an important part of the process since they can see everything.
Ticketing system support
Is your ticket system public (e.g. Gitlab internal to organization, but not private to your team) or private (requestors only see their own tickets). You should answer respectfully anyway, but this does matter somehow. The more people who can see it, the more careful you should be, but also the more long-term benefit your answers have.
Document your intermediate progress at least as comments in the tickets - if it’s not appropriate to send to the user, too. (see above about silence)
You want separate issues in separate tickets. Often times, users will ask multiple things at once. You’ll have to figure out what to do about it, but you should probably clearly say “more emails is better, don’t worry about sending us three emails all at the same time if they are different things”.
Can you separate issues yourself, instead of replying “please send this again”
Private email support
Do you forward it to a ticket system? Information in private email always gets lost.
If you reply with only “please re-send this”, that can sound like you don’t want the issue in the first place. What do you do?
Plan for problem situations
Exercises:
How do you answer things such as the following? Write draft responses:
Not enough information
Possibly
Mis-directed
Something requestor should be able to do themselves?
Examples
(examples to be inserted here)
See also
Events are listed below in chronological order, but sort of sorted by usefulness to a broad audience in the left sidebar (including events which have been drafted but not presented).
Triton hardware, Ivan Degtyarenko, Wed 3.3 2021, 10:00
Triton hardware wise: machine room, different archs, IPMI, hardware troubleshooting
[Material includes sensitive data, can be provided on request]
Triton networking, Ivan Degtyarenko, Fri 12.3 2021, 10:15-11:15
Networking: IB and Ethernet setup, IB islands, troubleshooting
Interval video (Material includes sensitive data, provided on request)
Ansible for FCCI, Mikko Hakala, Mon 22.3 2021, 14-15
Ansible, provisioning with OpenHPC, standalone servers
Internal video
User support in Aalto Scientific Computing, Richard Darst, Mon 29.3 2021, 14-15
User support made easy: different support level by Science IT, docs, issue tracker, garage, etc
Triton software stack, Simo Tuomisto, Fri 9.4 2021, 10:15-11:15
Triton / FCCI software stack: Spack, building software, …
Jupyter at Aalto, Richard Darst, Fri 30.4 2021, 10:15
Jupyter setup at Aalto jupyter.triton.aalto.fi, best practices.
Internal video (but it should be published)
Anaconda on Triton: automatic build system, Simo Tuomisto, Fri 7.5 2021, 10:15
Anaconda setup on Triton
Diversity in computational sciences vs university services
This wasn’t originally given in FCCI Tech but is relevant to the people reading this page.
Sphinx documentation, Richard Darst, Fri 14.5 2021, 10:15
Open and accessible documentation using Sphinx, RST/MyST, and Readthedocs: the story behind scicomp.aalto.fi.
ClusterStor, Andreas Muller (HPE), Tue 18.5 2021, 12:00
Storage systems: ClusterStor hardware and software behind Triton’s new /scratch. Maintenance, troubleshooting.
RSE service status update, Jarno Rantaharju, Marijn van Vliet, and Richard Darst, Fri 28.5 2021, 10:15
RSE program: spring 2021 summary. Impact we have made so far.
How we did Summer Kickstart 2021, Richard darst + Reading + Video
Introduction to a Kubernetes deployment, Richard Darst, Fri 8.10 2021, 10:15
jupyter.cs, Richard Darst, Fri 19.11 2021, 10:00
Triton authentication, Mikko Hakala, Fri 26.11 2021, 10:15
Internal video
NetApp at Aalto: department admins guide, Pekka Alaruikka / Mika Kontiala, Fri 3.12 2021, 10:15
NetApp setup at Aalto
what department admins may and may not of TeamWork
Practicalities: volumes, exports, qtrees, quotas, settings, permissions etc
(if time left) about backups on the TeamWork, troubleshooting, getting help, etc
High Performance Clusters at NVIDIA, Janne Blomqvist, Fri 10.12 2021, 10:15
NVIDIA cluster setup overview
Best practices of the HPC cluster maintenance
What we are doing wrong at FCCI as comparing to NVIDIA
The future of teaching: CodeRefinery teaching strategy Richard Darst, Fri 17.12 2021, 10:00
The role of teaching in CodeRefinery and Aalto Scientific Computing
Tools and strategies we use to successfully teach online: HackMD, streaming, helpers, teams, co-teaching, and more.
Future outlook and goals
Open onDemand experience by Esko Järnfors et all (CSC), Fri 17.12 2021, 12:00
NOTE: the second talk on the same Fri 17.12
Simple Kubernetes deployment by Richard Darst, Fri 3 Nov 2023
Demo: Publishing a Python Package by Jarno Rantaharju, Fri Jan 26th 2024
Demonstration of open source software publishing. I will take a part of an existing Python package and spin it of as a small stand-alone package. We will discuss what is needed for a software publication and recommended practices.
Proposed/requested future topics
SLURM setup, Simppa Äkäslompolo
Cluster monitoring, Simo/Mikko
Online courses and CodeRefinery, Richard Darst
Online work and support, Richard Darst
Respectfully and efficiently handling user support requests, Richard Darst
Science-IT data management: policies and procedures
Science-IT data management: storage systems and tech setup
History and structure of FCCI
Security
Send pull requests to this section to add more requests, or to the previous section to schedule a talk.
Other
Sustainability and Environment statement for the Triton HPC cluster
Building a sustainable future is the most important goal of our community and saving energy is one of the most significant actions that we can take to improve sustainability. Fast and large computational resources have high energy requirements, whether it is our Aalto High Performance Computing (HPC) cluster Triton or the workstation at your desk. At Aalto Scientific Computing / Science IT we take these things seriously and we believe that transparency in the energy consumption of our shared computational resources benefits both the users of our cluster as well as the general public.
In this statement, we first summarize the action points that we are implementing to improve Triton HPC energy efficiency and list what you can do as a Triton user to improve the environmental impact of your computations. Then we describe the energy consumption of the computational nodes that are forming our HPC cluster and we explain the energy saving strategies that are implemented to optimize energy when the nodes are idle and also when the situation of the national energy demand from FinGrid requires everyone to be more careful with energy consumption.
Important
What Aalto Scientific Computing / Science IT is doing to reduce energy consumption
Here the main action points on what Aalto Scientific Computing / Science IT is doing to reduce energy consumption for the Triton HPC cluster.
Support for researchers to optimize their calculations: Our daily SciComp garage session and Research Software Engineering service provide ongoing support for making all computational and data-intensive work as efficient as possible.
Switching off nodes when the national energy demand is high (coming during the winter): Though the Triton HPC cluster will not be affected by Fingrid power cuts, we will reduce the number of active computational resources during periods of high demand according to Fingrid announcements. Triton is too small to directly participate in Fingrid’s relevant demand response program.
Acquiring newer hardware with better energy efficiency (ongoing): More energy efficient nodes are being acquired and they are already replacing older hardware.
Moving to a new datacenter with better power usage effectiveness (2024): A new colocation facility with better PUE has been chosen and we have started the work needed to switch to the new location.
What Triton users can do to reduce their energy consumption
The most important thing you can do is to make your computations as efficient as practical. Second, centrally-hosted compute infrastructure is generally much more efficient than standalone computing solutions. For any of the matters below, we offer extensive, immediate support in our daily garage (every day at 13:00). For significant cases, our Research Software Engineers can directly work with you to improve your workflow with minimal trouble to you.
Make your computations efficient. Make sure that a) your own code is as efficient as reasonable, and b) it fully uses the reserved HPC resources.
Not all code deserves to be fully optimized, but the more resources you use, the more you should think about optimizing.
When you work with HPC resources, your starting point for looking at the efficiency of your computations is
seff <JOBID>
. For additional support, come to the daily garage mentioned above.
Save energy by using the centralized infrastructure: HPC computations are more efficient than your workstation - even before considering your workstation’s idle during development. Being a shared resource, you only use what you need to use.
Do you always need a GPU? While many computational tools are offering faster computing times by using GPUs, one should consider how much is gained by using GPU versus CPU. Roughly the most expensive GPUs need 5 times more energy than a full 40-core CPU node (assuming that the efficiency of the computations is at 100%), so if your computation over GPUs is not at least 5 times faster than running it on CPUs, you should consider avoiding using GPU and accepting a 2-to-4 times slower computational time.
Controlled power cuts: the Triton HPC cluster will not be affected
As communicated by FinGrid here, there are chances of national power cuts during the upcoming winter. The Triton cluster is colocated in a CSC machine room, which also hosts other nationally important infrastructure and is not expected to be affected by the power cuts. In the case of unexpected outages, there is a backup generator. When it comes to the connectivity between the internet and Triton, Aalto IT Services has ensured that also the physical switches providing remote access are not going to be affected by power cuts.
Even though Triton should not be affected by power cuts, we will react to the national electricity supply and reduce the power consumed during these periods.
Energy consumption of Triton
For the first half of 2022, Triton’s average power was 214 kW (long-run average). This includes all compute nodes, GPUs, data storage, network, and other administrative servers. It does not include cooling.
A typical CPU node consumes around 450W when active and 60W when idling (Dell PowerEdge C6420, 40 CPU cores).
The newest GPU nodes use 2200W at peak use and average 1200W (Dell PowerEdge XE8545, 48 CPU cores and 4 NVIDIA A100 cards).
In general, Triton has a relatively high usage factor (on average above 90% in the year 2022), so there is minimal waste from idling. While our current machine room does not recover waste heat for district heating, our new machine room will be able to do so. Furthermore, we are constantly updating our hardware with new nodes with more efficient energy consumption. You can check further details about Triton’s hardware at this page.
For comparison, the minimum power to participate in Fingrid’s demand-response frequency restoration reserve market is 1MW.
Energy efficiency of the CSC colocation
The energy efficiency of colocation facilities is described by the ratio Power Usage Effectiveness (PUE) determined by dividing the total amount of power entering a data center by the power used to run the IT equipment within it. In an ideal world PUE should be as close as 1 as possible, the most efficient datacenters in the world are reporting a PUE of 1.02 (reference) and the average datacenter has a PUE of 1.57 (average from a survey in 2021).
Triton is physically located at CSC colocation facilities with other servers supporting all researchers in Finland (e.g. the FUNET network). Our current colocation has a PUE of 1.3. This is not the state of the art, although it is better than the average datacenter around the world. Energy efficiency will be a very important criteria in the upcoming move to a new facility. The current tentative plan is to move Triton’s hardware in a new colocation during the year 2023 and be ready for 2024.
Impact of Triton hardware purchases
Unlike many clusters, Triton does not build a new cluster every few years. Triton is continually upgraded, and old hardware is only discarded after it is actually obsolete (which usually comes due to excessive energy consumption relative to newer hardware). This allows us to adjust the e-waste/power consumption tradeoff dynamically, depending on the circumstances. We try to minimize the entire lifecycle impact of our cluster. Yes, Triton is a metaphorical Ship of Theseus.
Web accessibility
This website is partially conformant with the Web Content Accessibility Guidelines (WCAG) level AA.
This is the accessibility statement for the scicomp.aalto.fi website. The accessibility requirements are based on the Act on the Provision of Digital Services (306/2019).
But as we know from other Aalto web sites, web accessibility doesn’t mean it’s actually useful for any particular purpose. We strive to make this site actually usable by everyone, and we welcome any contributions to help us with that.
Accessibility status of the website
The Web Content Accessibility Guidelines (WCAG) defines requirements for designers and developers to improve accessibility for people with disabilities. Based on self-assessment with Web Accessibility Evaluation Tool, this website is partially conformant with WCAG 2.1 level AA on computers, tablets, and smartphones. Partially conformant means that some parts of the content do not fully conform to the accessibility standard.
Inaccessible content
Below is a description of known limitations, and potential solutions. Please contact us if you observe an issue not listed below.
Known limitations for scicomp.aalto.fi website:
Inclusion of PDF documents that might have accessibility issues.
Please follow this issue to track updates and improvements to the accessibility of scicomp.aalto.fi.
Technical specifications
Accessibility of scicomp.aalto.fi website relies on the following technologies to work with the particular combination of web browser and any assistive technologies or plugins installed on your computer:
HTML
CSS
WAI-ARIA
These technologies are relied upon for conformance with the accessibility standards used.
Next steps for improving the accessibility
Please follow this issue to track updates and improvements to the accessibility of scicomp.aalto.fi.
Accessibility feedback
We welcome your feedback on the accessibility of scicomp.aalto.fi website. Please let us know if you encounter accessibility barriers on scicomp.aalto.fi website:
Phone: +358503841575
E-mail: scicomp@aalto.fi
Release and update information
This accessibility statement was last updated on 26 October 2020.
This website was launched on 15 June 2017.
This accessibility statement is based on a similar statement from Fairdata.fi.
About this site
These docs originally came from the Triton User Guide, but now serves as a general Aalto scientific computing guide. The intention is a good central resources for researchers, kept up to date by the whole community. Many parts are useful to the broader world, too. We encourage the community and world to all contribute when they see a need.
Sphinx is a static site generator - you can build the site on your own computer and browse the HTML. It’s automatically built and hosted by ReadTheDocs, but you don’t need to mess with that part. Github will validate basic syntax in pull requests.
See also
Technical documentation with Sphinx for an overview about how and why it’s set up like this.
Contributing
We welcome contributions via normal Github open source practices: send us a pull request.
This documentation is Open Source (CC-BY 4.0), and we welcome contributions from the community. The project is run on Github in the repository AaltoSciComp/scicomp-docs.
To contribute, you can always use the normal Github contribution mechanisms: make a pull request, issues, or comments. If you are at Aalto, you can also get direct write access. Make a github issue, then contact us in person/by email for us to confirm.
The worst contribution is one that isn’t made. Don’t worry about making things perfect: send your improvement and someone can improve the syntax/writing/etc as needed. This is also true for formatting errors - if you can’t do ReStructudedText perfectly, just do your best (and pretend it’s markdown because all the basics are similar).
When you submit a change, there is continuous testing that will notify you of errors, so that you can make changes with confidence: “wiki rules: deploy and iterate” rather than “perfect before merge”.
Contributing gives agreement to use content under the licenses (CC-BY 4.0 or CC0 for examples).
Requirements and building
Set up the environment first (example, but do as you’d like). The
basic requirements are sphinx
and sphinx_rtd_theme` which are
also in Ubuntu: (``python-sphinx
and python-sphinx-rtd-theme
):
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
Then you can build it locally to test:
$ make html
$ sphinx-autobuild . _build/html/ # starts web server that automatically updates
$ make clean check # Full rebuild and warn of important errors
HTML output is in _build/html/index.html
.
Editing
In short: find an example page and copy. To add sections, add a new page in a
subfolder. In order to appear in the sidebar, it has to be linked
from a toctree
directive. Check nearby index.rst
pages and
add there.
Recommended pages for copying:
Serial Jobs - tutorial
Python Environments with Conda - topical discussion on a certain item
/triton/apps/TEMPLATE.rst.template
is a template for an application information page.Harbor: Container registry for images and artifacts: service page.
Most common missed quirks
Double backquote for literal text, not single. (Why? Single can be assigned other purposes, like :doc: links, :ref: links, or in other projects :func: and so on. We be generic so compatible with other projects that make a different choice.):
Run ``ssh -X triton.aalto.fi`` to ...
Raw HTML links have two underscores. (Why? single underscore is some other fancy things. Most links are internal reference/docs links):
The `OpenSSH project <https://www.openssh.com/>`__ does...
Internal links have structures: they can be
:doc:
,:ref:
, etc. If you give a link to something, it knows where it is, validates it at build time, and you can give just the link and it takes the title from the target.You can set default highlighting for literal blocks, so you don’t have to do
.. code-block:: LANGUAGE
all the time:.. highlight:: console
This sets the default for all literal blocks, but you can still make a
..code-block::
for other cases (or change it partway through).For command line, use the
console
highlighting language instead ofbash
or others.console
will highlight the$
and make it not selectable so it won’t be copied.This isn’t relevant to scicomp-docs, but intersphinx lets you link directly to function/etc definitions in other Sphinx docs, by function name. (This is why rigid structure is nice). Python for SciComp heavily uses this for great effect.
ReStructured text
ReStructured Text is similar to markdown for basics, but has a more strictly defined syntax and more higher level structure. This allows more semantic markup, more power to compile into different formats (since there isn’t embedded HTML), and advanced things like indexing, permanent references, etc.
Restructured text quick reference and home.
Note: Literal inline text uses ``
instead of a single `
(second
works but gives warning).
A very quick guide is below.
Inline syntax
Inline code/monospace
, emphasis, strong emphasis
``Inline code/monospace``, *emphasis*, **strong emphasis**
Literal blocks, code highlighting
Literal blocks (= code blocks) use ::
and are intended:
Literal block
Literal block
::
Literal block
Literal blocks
Block quotes can also start with paragraph ending in double colon, like this:
Block quote
Block quotes can also start with paragraph ending in double colon,
like this::
Block quote
If you define a highlight language, it will be used as the default highlight language for every block:
.. highlight:: python
Use Python
for python. Use console
for console commands, and
include the $
before the commands. The $
won’t be selectable
so copy-and-paste works well.
Internal page links
Linking internally. If possible use a permanent reference (next
section), but you can also refer to specific files by name. Note,
that for internal links there are no trailing underscores. Internal
links can get their text from the target. Internal links are the
:doc:
domain:
:doc:`../tut/interactive.rst`
With different text: :doc:`Text <../tut/interactive.rst>`
Internal reference links
Internal links: ReST permanent references across files.
Label things this way (note only one colon):
.. _label-name:
Reference them this way:
:ref:`label-name` (recommended)
`label-name` (short, don't use, no warning if link breaks)
`Text <label-name>` (short, don't use, no warning if link breaks)
URL links
Inline link, or anonymous, or separate, or different text links. Trailing underscores indicate links. Note there should be two underscores for the raw links.
Inline `link <https://www.python.org>`__, or
anonymous__, or
separate_, or
`different text <separate_>`_ links.
Trailing underscores indicate links.
__ https://www.python.org
.. _separate: https://www.python.org
Admonitions: notes, warnings, etc.
Notes, warnings, etc.
Note
This is a note.
Warning
This is a warning.
Admonition directives have titles.
This has misc text.
Dropdown can be clicked to expand.
When it’s not important for everyone to see. :class: dropdown
sets a CSS class which gets interpreted in the HTML.
.. note::
This is a note.
.. warning::
This is a warning.
.. admonition:: Admonition directives have titles.
This has misc text.
.. admonition:: Dropdown can be clicked to expand.
:class: dropdown
When it's not important for everyone to see. ``:class: dropdown``
sets a CSS class which gets interpreted in the HTML.
Indexing
Indexing isn’t currently used.
.. index:: commit; amend
.. index::
commit
commit; message
pair: commit; amend
:index:`commit`
:index:`loop variables <pair: commit; amend>`
Aalto Scientific Computing (ASC) maintains these pages with the help of the Aalto community. This site is open source: all content is licensed under CC-BY 4.0 and all examples under CC0 (public domain). Additionally, this is an open project and we strongly encourage anyone to contribute. For information, see the About this site and the Github links at the top of every page. Either make Github issues, pull requests, or ask for direct commit access. Be bold: the biggest problem is missing information, and mistakes can always be fixed.
Mastodon