Tapis¶
Tapis is an open source, science-as-a-service API platform for powering your digital lab. Documentation is presented below:
Introduction¶
This is the documentation for Tapis V2. The documentation for Tapis V3 is listed in the section below.
The Tapis V2 Platform is an open source, science-as-a-service API platform for powering your digital lab. Tapis allows you to bring together your public, private, and shared high performance computing (HPC), high throughput computing (HTC), Cloud, and Big Data resources under a single, web-friendly REST API.
- Run code
- Manage data
- Collaborate meaningfully
- Integrate anywhere
The Tapis documentation site contains documentation, guides, tutorials, and lots of examples to help you build your own digital lab.
Source Code¶
If you are looking for source code, you can find it here:
Conventions¶
Throughout the documentation you will regularly encounter the following variables. These represent user-specific values that should be replaced when attempting any of the calls using your account.
Variable | Description | Example |
---|---|---|
${API_HOST} | Base hostname of the API | api.tacc.utexas.edu |
${API_VERSION} | Version of the API endpoint | v2.2.8 |
${API_USERNAME} | Username of the current user | nryan |
${API_KEY} | Client key used to request an access token from the Tapis Auth service | hZ_z3f4Hf3CcgvGoMix0aksN4BOD6 |
${API_SECRET} | Client secret used to request an access token from the Tapis Auth service | gTgpCecqtOc6Ao3GmZ_FecVSSV8a |
${API_TOKEN} | Client unique identifier of an application requesting access to Tapis Auth service | de32225c235cf47b9965997270a1496c |
JSON Notation¶
{
"active": true,
"created": "2014-09-04T16:59:33.000-05:00",
"frequency": 60,
"id": "0001409867973952-5056a550b8-0001-014",
"internalUsername": null,
"lastCheck": [
{
"created": "2014-10-02T13:03:25.000-05:00",
"id": "0001412273000497-5056a550b8-0001-015",
"message": null,
"result": "PASSED",
"type": "STORAGE"
},
{
"created": "2014-10-02T13:03:25.000-05:00",
"id": "0001411825368981-5056a550b8-0001-015",
"message": null,
"result": "FAILED",
"type": "LOGIN"
}
],
"lastSuccess": "2014-10-02T11:03:13.000-05:00",
"lastUpdated": "2014-10-02T13:03:25.000-05:00",
"nextUpdate": "2014-10-02T14:03:15.000-05:00",
"owner": "systest",
"target": "demo.storage.example.com",
"updateSystemStatus": false,
"_links": {
"checks": {
"href": "https://api.tacc.utexas.edu/monitor/v2/0001409867973952-5056a550b8-0001-014/checks"
},
"notifications": {
"href": "https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=0001409867973952-5056a550b8-0001-014"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/systest"
},
"self": {
"href": "https://api.tacc.utexas.edu/monitor/v2/0001409867973952-5056a550b8-0001-014"
},
"system": {
"href": "https://api.tacc.utexas.edu/systems/v2/demo.storage.example.com"
}
}
}
Javascript dot notation will be used to refer to individual properties of JSON objects. For example, consider the following JSON object.
active
refers to the top levelactive
attribute in the response object.lastCheck.[].result
generically refers to the result attribute contained within any of the objects contained in thelastCheck
array.lastCheck.[0].result
specifically refers to the result attribute contained within the first object in thelastCheck
array.
Versioning¶
The current major version of Tapis is given in the URI immediately following the API resource name. For example, if the endpoint is https://api.tacc.utexas.edu/jobs/v2/
, the API version would be v2
. The current major version of Tapis is v2. (Full version: 2.2.23)
Special Character Handling¶
In certain situations, usually where file system paths and names are involved in some way, Tapis will generate sanitized object names (“slugs”) to make them safe to use. Slugs will be created on the fly by applying the following rules:
- Lowercase the string
- Replace spaces with a dash
- Remove any special characters and punctuation that might require encoding in the URL. Allowed characters are alphanumeric characters, numbers, underscores, and periods.
Secure communication¶
Tapis uses SSL to secure communication with the clients. If HTTPS is not specified in the request, the request will be redirected to a secure channel.
Requests¶
The Tapis API is based on REST principles: data resources are accessed via standard HTTPS requests in UTF-8 format to an API endpoint. The API uses appropriate HTTP verbs for each action whenever possible.
Verb | Description |
---|---|
GET | Used for retrieving resources |
POST | Used for creating resources |
PUT | Used for manipulating resources or collections |
DELETE | Used for deleting resources |
Common API query parameters¶
Several URL query parameters are common across all services. The following table lists them for reference
Name | Values | Purpose |
---|---|---|
offset | integer (zero based) | Skips the first offset results in the response |
limit | integer | Limits the number of responses to, at most, this number |
pretty | boolean | If true, pretty prints the response. Default false |
naked | boolean | If true, returns only the value of the result attribute in the standard response wrapper |
filter | string | A comma-delimited list of fields to return for each object in the response. Each field may be referenced using JSON notation |
Experimental query parameters¶
Starting with the 2.1.10 release, two new query parameters have been introduced into the jobs API as an experimental feature. The following table lists them for reference
Name | Values | Purpose |
---|---|---|
sort | asc,desc | The sort order of the response. asc by default |
sortBy | string | The field by which to sort the response. Any field present in the full representation of the resource that you are querying is supported. Multiple values are not currently supported |
Responses¶
All data is received and returned as a JSON object.
Response Details¶
{
"status": "error",
"message": "Permission denied. You do not have permission to view this system",
"version": "2.1.16-r8228",
"result": {}
}
Apart from the response code, all responses from Tapis are in the form of a json object. The object takes the following form.
Key | Value Type | Value Description |
---|---|---|
status | string | User will see message=null on status "success" |
message | string | A short description of the cause of the error |
result | object,array | The JSON response object or array |
version | string | The current full release version of Tapis. Ex “2.1.16-r8228” |
Here, for example, is the response that occurs when trying to fetch information for system to which you do not have access:
Naked Responses¶
In situations where you do not care to parse the wrapper for the raw response data, you may request a naked response from the API by adding naked=true
in to the request URL. This will return just the value of the result
attribute in the response wrapper.
naked=true
{
"id" : "data.iplantcollaborative.org",
"name" : "CyVerse Data Store",
"type" : "STORAGE",
"description" : "CyVerse's petabyte-scale, cloud-based data management service.",
"status" : "UP",
"public" : true,
"lastUpdated" : "2017-10-10T00:00:00.000-05:00",
"default" : true,
"_links" : {
"self" : {
"href" : "https://agave.iplantc.org/systems/v2/data.iplantcollaborative.org"
}
}
}
naked=false
{
"status" : "success",
"message" : null,
"version" : "2.2.8-rff32e62",
"result" : [ {
"id" : "data.iplantcollaborative.org",
"name" : "CyVerse Data Store",
"type" : "STORAGE",
"description" : "CyVerse's petabyte-scale, cloud-based data management service.",
"status" : "UP",
"public" : true,
"lastUpdated" : "2017-10-10T00:00:00.000-05:00",
"default" : true,
"_links" : {
"self" : {
"href" : "https://agave.iplantc.org/systems/v2/data.iplantcollaborative.org"
}
}
} ]
}
Formatting¶
By default, all responses are serialized JSON. To receive pre-formatted JSON, add pretty=true
to any query string.
Note
The tapis-cli also produces a table formatted output.
Pagination¶
Pagination usinglimit
andoffset
query parameters.
curl -sk -H \
"Authorization: Bearer ${API_KEY}" \
"https://api.tacc.utexas.edu/jobs/v2/?offset=50&limit=25"
tapis jobs list -o 50 -l 25
All resource collections support a way of paging the dataset, taking an offset
and limit
as query parameters:
Note that offset numbering is zero-based and that omitting the offset parameter will return the first X elements. By default, all search and listing responses from the Science APIs are paginated in groups of 250 objects. The lone exception being the Files API which will return all results by default.
Check the documentation for the specific endpoint to see specific information.
Timestamps¶
Timestamps are returned in ISO 8601 format offset for Central Standard Time (-05:00) YYYY-MM-DDTHH:MM:SSZ-05:00
.
Cross Origin Resource Sharing (CORS)¶
Many modern applications choose to implement client-server communication exclusively in Javascript. For this reason, Tapis provides cross-origin resource sharing (CORS) support so AJAX requests from a web browser are not constrained by cross-origin requests and can safely make GET, PUT, POST, and DELETE requests to the API.
Hypermedia¶
{
"associationIds": [],
"created": "2013-11-16T11:25:38.900-06:00",
"internalUsername": null,
"lastUpdated": "2013-11-16T11:25:38.900-06:00",
"name": "color",
"owner": "nryan",
"uuid": "0001384622738900-5056a550b8-0001-012",
"value": "red",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/0001384622738900-5056a550b8-0001-012"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
}
}
}
Tapis is a fully descriptive hypermedia API. Given any point, you should be able to run API through the links provided in the _links
object in each resource representation. The following user metadata object contains two referenced objects. The first, self
is common to all objects, and contains the URL of that object. The second, owner
contains the URL to the profile of the user who created the object.
Customizing Responses¶
Returns the name, status, app id, and the url to the archived job output for every user job
curl -sk -H \
"Authorization: Bearer ${API_KEY}" \
"https://api.tacc.utexas.edu/jobs/v2/?limit=2&filter=name,status,appId,_links.archiveData.href
tapis jobs list -v -l 2 -c name -c id -c status -c _links.archiveData
The response would look something like the following:
[
{
"name" : "demo-pyplot-demo-advanced test-1414139896",
"status": "FINISHED",
"appId" : "demo-pyplot-demo-advanced-0.1.0",
"_links": {
"archiveData": {
"href": "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
}
}
},
{
"name": "demo-pyplot-demo-advanced test-1414270831",
"status": "FINISHED",
"appId" : "demo-pyplot-demo-advanced-0.1.0",
"_links": {
"archiveData": {
"href": "https://api.tacc.utexas.edu/jobs/v2/3259859908028273126-242ac115-0001-007/outputs/listings"
}
}
}
]
Returns the system id, type, whether it is your default system, and the hostname from the system’s storage config
curl -sk -H \
"Authorization: Bearer ${API_KEY}" \
"https://api.tacc.utexas.edu/systems/v2/?filter=id,type,default,storage.host
tapis systems list -v -l 2 -c id -c name -c type -c default -c storage.host
The response would look something like the following:
[
{
"id": "user.storage",
"type": "STORAGE",
"default": false,
"storage": {
"host": "data.tacc.utexas.edu"
}
},
{
"id": "docker.tacc.utexas.edu",
"type": "EXECUTION",
"default": false,
"storage": {
"host": "129.114.6.50"
}
}
]
In many situations, Tapis may return back too much or too little information in the response to a query. For example, when searching jobs, the inputs
and parameters
fields are not included in the default summary response objects. You can customize the responses you receive from all the Science APIs using the filter
query parameter.
The filter
query parameter takes a comma-delimited list of fields to return for each object in the response. Each field may be referenced using JSON notation similar to the search syntax (minus the .[operation]
suffix.
Status Codes¶
The API uses the following response status codes, as defined in the RFC 2616 on successful and unsuccessful requests.
Success Codes¶
Response Code | Meaning | Description |
---|---|---|
200 | Success | The request succeeded |
201 | Created | The request succeeded and a new resource was created. Only applicable on PUT and POST actions |
202 | Accepted | The request has been accepted for processing, but the processing has not been completed. Common for all async actions such as job submissions, file transfers, etc |
206 | Partial Content | The server has fulfilled the partial GET request for the resource. This will always be the return status of a request using a Range header |
301 | Moved Permanently | The requested resource has been assigned a new permanent URI. You should follow the Location header, repeating the request |
304 | Not Modified | You requested an action that succeeded, but did not modify the resource |
Error Codes¶
Response Code | Meaning | Description |
---|---|---|
400 | Bad request | Your request was invalid |
401 | Unauthorized | Authentication required, but not provided |
403 | Forbidden | You do not have permission to access the given resource |
404 | Not found | No resource was found at the given URL |
405 | Method Not Allowed | You tried to access a resource with an invalid method |
406 | Not Acceptable | You requested a response format that isn’t supported |
Best Practices¶
General¶
- Always use SSL. Tapis services will force SSL if you don't specify it, but it's best to protect your application with SSL as a best practice.
Systems Service¶
- Use restricted SSH keys whenever possible.
- Open SSH keys are not supported.
- Use SSH keys rather than passwords whenever possible.
- Use a MyProxy Gateway service whenever available rather than a stock MyProxy service to avoid password exposure.
- Always configure a default storage system for your organization. This provides tremendous benefit to users who don't want to think about the makeup of your infrastructure.
- Use contextual naming for systems.
nryan-vm-sftp-prod
is favorable tomy-vm
. DNS is also a good approach to naming, but you will still need to contextualize it with something like a username since multiple users may want to register the same system. - Grant the minimum sufficient role for a user that enables them to do what you want them to do. Don't grant a PUBLISHER role when a GUEST role will suffice. Don't grant an ADMIN role when a USER role will get the job done.
- Always explicitly specify a
scratchDir
for your execution systems. This will allow you easily see where your job data will go and avoids systems where your home directory has a smaller quota than other areas of your system.
Files Service¶
- Always favor the full canonical URL over assuming default systems. Default systems may change on a user-to-user basis, but canonical URLs will always be the same.
- Error on the side of privacy by granting permissions to single users and groups over making data public.
- Avoid over-sharing by granting permissions on specific files or minimum subtrees rather than sharing entire home folders.
PostIts¶
- Always limit the lifetime of a postit by specifying either the maximum number of uses or an expiration date. This will prevent people from accessing resources long after you intended for them to do so.
Tutorials¶
This tutorial is designed to allow you to practice and get familiar with the Tapis enviornment.
Prerequisites¶
In order to navigate this tutorial you should have knowledge and familiarity with the following items:
- SSH with keys to a host
- List files
- Navigate to directories
- Additional basic commands
- Open, edit, save a text file
- File/Dir permissions
- Intro to APIs, HTTP and basics of REST (replace python.requests with curl): https://tacc.github.io/CSC2017Institute/docs/day2/APIs_intro.html
- Intro to HTTP authentication: https://tacc.github.io/CSC2017Institute/docs/day2/Intro_Authentication_in_HTTP.html
- Intro to GNU Coreutils: https://tacc.github.io/ctls2017/docs/gnu_utils/gnu_utils_01.html
Possess a TACC user account¶
In order to obtain a TACC user-account, first you must proceed to https://portal.tacc.utexas.edu/account-request?p_p_id=createaccount_WAR_createaccountportlet&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_createaccount_WAR_createaccountportlet_action=continue
- At the bottom of the page there is a button to click and accept. Click that button.
- At the next page you must fill out your contact information.
Quick Start Tutorial¶
This quick start guide is designed to show you how to do the following:
- Create an Oauth Client.
- Submit a job using a public image classifyApp.
- Retrieve job output information.
Create an OAuth client¶
Most requests to the Tapis REST APIs require authorization; that is, the user must have granted permission for an application to access the requested data.
Step 1: Create an Oauth Client by entering the following curl command:
curl -sku "$API_USERNAME" -X POST \
-d "clientName=my_cli_app&description=Client app used for scripting up cool stuff" \
https://api.tacc.utexas.edu/clients/v2
Create a variable for the client key and secret by entering:
export key=<client key>
export secret=<secret>
Step 2: Generate an access token by entering the following curl command:
curl -v -u $key:$secret -X POST
-d 'grant_type=password&username=<username>&password=<password>&scope=PRODUCTION'
https://api.tacc.utexas.edu/token
Once you have obtained that token, save it as a variable by entering the following command
export tok=<TOKEN>
For more information please see:
OAuth tutorial: https://tacc.github.io/CSC2017Institute/docs/day2/Intro_Agave_OAuth.html
Running a job¶
Now you are ready to run a Tapis Job. The Tapis Jobs is the service that allows you to run applications registered with the Tapis Apps service across multiple, distributed, heterogeneous systems through a common REST interface.
For this tutorial we have registered an Image Classifier App using Tapis Apps Service. Tapis.app.imageclassify-1.0u3 is a public app that uses public storage and execution systems. Follow the steps below to submit the Tapis Job and view the output.
Step 1: Crafting the job definition:
Create the following file jobs.json
{
"name":"tapis.demo.imageclassify.job",
"appId":"tapis.app.imageclassify-1.0u3",
"archive":false,
"memoryPerNode":"1"
}
Job parameters used referred in the definition above are:
- name- The user selected name for the job.
- appId- The unique ID (name + version) of the application run by this job. This must be a valid application that the user has permission to run.
- archive- Whether the job output should be archived. When true, all new files created during job execution will be moved to the Archive Path on the Archive system.
- memoryPerNode- The memory requested for each node on which the job runs. Values are expressed as [num][units], where num can be a decimal number and units can be KB, MB, GB, TB (default = GB). Examples include 200MB, 1.5GB and 5.
Step 2: Submit the job by using the curl-command below:
curl -sk -H "Authorization: Bearer $tok" -X POST -d @jobs.json \
-H "Content-Type: application/json" https://api.tacc.utexas.edu/jobs
Note: Please make sure to run it from the same folder where you have created jobs.json You should see a message “Successfully submitted job job-id”. Everytime you submit a job, a unique job id is created.
Job output¶
You can check the status of the job and receive the output of the job at the same time.
Type in the curl command below:
curl -sk -H "Authorization: Bearer $tok" https://api.tacc.utexas.edu/jobs/v2/$job_id/outputs/listings/?pretty=true
NOTE
You can download the files if you want by entering in the command:
curl -sk -H "Authorization: Bearer $tok" https://api.tacc.utexas.edu/jobs/v2/$job_id/outputs/media/$PATH
Guides¶
The Tapis REST APIs enable applications to create and manage digital laboratories that spans campuses, the cloud, and multiple data centers using a cohesive set of web-friendly interfaces.
Authorization¶
Most requests to the Tapis REST APIs require authorization; that is, the user must have granted permission for an application to access the requested data. To prove that the user has granted permission, the request header sent by the application must include a valid access token.
Before you can begin the authorization process, you will need to register your client application. That will give you a unique client key and secret key to use in the authorization flows.
Supported Authorization Flows¶
The Tapis REST APIs currently supports four authorization flows:
- The Authorization Code flow first gets a code then exchanges it for an access token and a refresh token. Since the exchange uses your client secret key, you should make that request server-side to keep the integrity of the key. An advantage of this flow is that you can use refresh tokens to extend the validity of the access token.
- The Implicit Grant flow is carried out client-side and does not involve secret keys. The access tokens that are issued are short-lived and there are no refresh tokens to extend them when they expire.
- Resource Owner Password Credentials flow is suitable for native and mobile applications as well as web services, this flow allows client applications to obtain an access token for a user by directly providing the user credentials in an authentication request. This flow exposes the user’s credentials to the client application and is primarily used in situations where the client application is highly trusted such as the command line.
- The Client Credentials flow enables users to interact with their own protected resources directly without requiring browser interaction. This is a critical addition for use at the command line, in scripts, and in offline programs. This flow assumes the person registering the client application and the user on whose behalf requests are made be the same person.
Flow | Can fetch a user’s data by requesting access? | Uses secret key? (key exchange must happen server-side!) | Access token can be refreshed? |
---|---|---|---|
Authorization Code | Yes | Yes | Yes |
Implicit Grant | Yes | No | No |
Resource Owner Password Credentials | Yes | Yes | Yes |
Client Credentials | No | Yes | No |
Unauthorized | No | No | No |
Token lifetimes¶
There are two kinds of tokens you will obtained: access and refresh. Access token lifetimes are configured by the organization operating each tenant and vary based on the flow used to obtain them. By default, access tokens are valid for 4 hours.
Authorization Flow | Access Token Lifetime | Refresh Token Lifetime |
---|---|---|
Authorization | 4 hours | infinite |
Implicit | 1 hour | n/a |
User Credential Password | 4 hours | infinite |
Client Credentials | 4 hours | n/a |
Authorization Code¶
The method is suitable for long-running applications in which the user logs in once and the access token can be refreshed. Since the token exchange involves sending your secret key, this should happen on a secure location, like a backend service, not from a client like a browser or mobile apps. This flow is described in RFC-6749. This flow is also the authorization flow used in our REST API Tutorial.
1. Your application requests authorization¶
A typical request will look something like this
https://api.tacc.utexas.edu/authorize/?client_id=gTgp...SV8a&response_type=code&redirect_uri=https%3A%2F%2Fexample.com%2Fcallback&scope=PRODUCTION&state=866
The authorization process starts with your application sending a request to the Tapis authorization service. (The reason your application sends this request can vary: it may be a step in the initialization of your application or in response to some user action, like a button click.) The request is sent to the /authorize endpoint of the Authorization service:
The request will include parameters in the query string:
Request body parameter | Value |
---|---|
response_type | Required. As defined in the OAuth 2.0 specification, this field must contain the value "code". |
client_id | Required. The application's client ID, obtained when the client application was registered with Tapis (see Client Registration). |
redirect_uri | Required. The URI to redirect to after the user grants/denies permission. This URI needs to have been entered in the Redirect URI whitelist that you specified when you registered your application. The value of redirect_uri here must exactly match one of the values you entered when you registered your application, including upper/lowercase, terminating slashes, etc. |
scope | Optional. A space-separated list of scopes. Currently only PRODUCTION is supported. |
state | Optional, but strongly recommended. The state can be useful for correlating requests and responses. Because your redirect_uri can be guessed, using a state value can increase your assurance that an incoming connection is the result of an authentication request. If you generate a random string or encode the hash of some client state (e.g., a cookie) in this state variable, you can validate the response to additionally ensure that the request and response originated in the same browser. This provides protection against attacks such as cross-site request forgery. See RFC-6749. |
2. The user is asked to authorize access within the scopes¶
The Tapis Authorization service presents details of the scopes for which access is being sought. If the user is not logged in, they are prompted to do so using their API username and password.
When the user is logged in, they are asked to authorize access to the actions and services defined in the scopes.
3. The user is redirected back to your specified URI¶
Let’s assume you provided the following callback URL.
https://example.com/callback
After the user accepts (or denies) your request, the Tapis Authorization service redirects back to the redirect_uri. If the user has accepted your request, the response query string contains a code
parameter with the access code you will use in the next step to retrieve an access token.
Sample success redirect back from the server
https://example.com/callback?code=Pq3S..M4sY&state=866
Query parameter | Value |
---|---|
access_token | An access token that can be provided in subsequent calls, for example to Tapis Profiles API. |
token_type | Value: "bearer" |
expires_in | The time period (in seconds) for which the access token is valid. |
state | The value of the state parameter supplied in the request. |
If the user has denied access, there will be no access token and the final URL will have a query string containing the following parameters:
# Sample denial redirect back from the server
https://example.com/callback?error=access_denied&state=867
Query parameter | Value |
---|---|
error | The reason authorization failed, for example: “access_denied” |
state | The value of the state parameter supplied in the request. |
4. Your application requests refresh and access tokens¶
POST https://api.tacc.utexas.edu/token
When the authorization code has been received, you will need to exchange it with an access token by making a POST request to the Tapis Authorization service, this time to its /token
endpoint. The body of this POST request must contain the following parameters:
Request body parameter | Value |
---|---|
grant_type | Required. As defined in the OAuth 2.0 specification, this field must contain the value "authorization_code". |
5. The tokens are returned to your application¶
# An example cURL request
curl -X POST -d "grant_type= authorization_code"
-d "code=Pq3S..M4sY"
-d "client_id=gTgp...SV8a"
-d "client_secret=hZ_z3f...BOD6"
-d "redirect_uri=https%3A%2F%2Fwww.foo.com%2Fauth"
https://api.tacc.utexas.edu/token
The response would look something like this:
{
"access_token": "a742...12d2",
"expires_in": 14400,
"refresh_token": "d77c...Sacf",
"token_type": "bearer"
}
On success, the response from the Tapis Authorization service has the status code 200 OK in the response header, and a JSON object with the fields in the following table in the response body:
Key | Value type | Value description |
---|---|---|
access_token | string | An access token that can be provided in subsequent calls, for example to Tapis REST APIs. |
token_type | string | How the access token may be used: always "Bearer". |
expires_in | int | The time period (in seconds) for which the access token is valid. (Maximum 14400 seconds, or 4 hours.) |
refresh_token | string | A token that can be sent to the Spotify Accounts service in place of an authorization code. (When the access code expires, send a POST request to the Accounts service /token endpoint, but use this code in place of an authorization code. A new access token will be returned. A new refresh token might be returned too.) |
6. Use the access token to access the Tapis REST APIs¶
Make a call to the API
curl -H "Authorization: Bearer a742...12d2"
https://api.tacc.utexas.edu/profiles/v2/me?pretty=true&naked=true
The response would look something like this:
{
"create_time": "20140905072223Z",
"email": "rjohnson@mlb.com",
"first_name": "Randy",
"full_name": "Randy Johnson",
"last_name": "Johnson",
"mobile_phone": "(123) 456-7890",
"phone": "(123) 456-7890",
"status": "Active",
"uid": 0,
"username": "rjohnson"
}
Once you have a valid access token, you can include it in Authorization
header for all subsequent requests to APIs in the Platform.
7. Requesting access token from refresh token¶
curl -u $key:$secret
-d grant_type=refresh_token
-d refresh_token=$refresh
https://api.tacc.utexas.edu/token
The response would look something like this.
{
"access_token": "61e6...Mc96",
"expires_in": 14400,
"token_type": "bearer"
}
Access tokens are deliberately set to expire after a short time, usually 4 hours, after which new tokens may be granted by supplying the refresh token originally obtained during the authorization code exchange.
The request is sent to the token endpoint of the Tapis Authorization service:
The body of this POST request must contain the following parameters:
Request body parameter | Value |
---|---|
grant_type | Required. Set it to "refresh_token". refresh_token |
refresh_token | Required. The refresh token returned from the authorization code exchange. |
The header of this POST request must contain the following parameter:
Implicit Grant¶
Implicit grant flow is for clients that are implemented entirely using JavaScript and running in resource owner’s browser. You do not need any server side code to use it. This flow is described in RFC-6749.
1. Your application requests authorization¶
https://api.tacc.utexas.edu/authorize?client_id=gTgp...SV8a&redirect_uri=http:%2F%2Fexample.com%2Fcallback&scope=PRODUCTION&response_type=token&state=867
The flow starts off with your application redirecting the user to the /authorize
endpoint of the Authorization service. The request will include parameters in the query string:
Request body parameter | Value |
---|---|
response_type | Required. As defined in the OAuth 2.0 specification, this field must contain the value "token". |
client_id | Required. The application's client ID, obtained when the client application was registered with Tapis (see Client Registration). |
redirect_uri | Required. This parameter is used for validation only (there is no actual redirection). The value of this parameter must exactly match the value of redirect_uri supplied when requesting the authorization code. |
scope | Required. A space-separated list of scopes. Currently only PRODUCTION is supported. |
state | Optional, but strongly recommended. The state can be useful for correlating requests and responses. Because your redirect_uri can be guessed, using a state value can increase your assurance that an incoming connection is the result of an authentication request. If you generate a random string or encode the hash of some client state (e.g., a cookie) in this state variable, you can validate the response to additionally ensure that the request and response originated in the same browser. This provides protection against attacks such as cross-site request forgery. See RFC-6749. |
show_dialog | Optional. Whether or not to force the user to approve the app again if they’ve already done so. If false (default), a user who has already approved the application may be automatically redirected to the URI specified by redirect_uri . If true , the user will not be automatically redirected and will have to approve the app again. |
2. The user is asked to authorize access within the scopes¶
The Tapis Authorization service presents details of the scopes for which access is being sought. If the user is not logged in, they are prompted to do so using their API username and password.
When the user is logged in, they are asked to authorize access to the services defined in the scopes. By default all of the Core Science APIs fall under a single scope called, PRODUCTION
.
3. The user is redirected back to your specified URI¶
Let’s assume we specified the following callback address.
https://example.com/callback
A valid success response would be
https://example.com/callback#access_token=Vr17...amUa&token_type=bearer&expires_in=3600&state=867
After the user grants (or denies) access, the Tapis Authorization service redirects the user to the redirect_uri
. If the user has granted access, the final URL will contain the following data parameters in the query string.
Query parameter | Value |
---|---|
access_token | An access token that can be provided in subsequent calls, for example to Tapis Profiles API. |
token_type | Value: "bearer" |
expires_in | The time period (in seconds) for which the access token is valid. |
state | The value of the state parameter supplied in the request. |
If the user has denied access, there will be no access token and the final URL will have a query string containing the following parameters:
A failed response would resemble something like
https://example.com/callback?error=access_denied&state=867
Query parameter | Value |
---|---|
error | The reason authorization failed, for example: “access_denied” |
state | The value of the state parameter supplied in the request. |
4. Use the access token to access the Tapis REST APIs¶
curl -H "Authorization: Bearer 61e6...Mc96" https://api.tacc.utexas.edu/profiles/v2/me?pretty=true
The response would look something like this:
{
"create_time": "20140905072223Z",
"email": "nryan@mlb.com",
"first_name": "Nolan",
"full_name": "Nolan Ryan",
"last_name": "Ryan",
"mobile_phone": "(123) 456-7890",
"phone": "(123) 456-7890",
"status": "Active",
"uid": 0,
"username": "nryan"
}
The access token allows you to make requests to any of the Tapis REST APIs on behalf of the authenticated user.
Resource Owner Password Credentials¶
The method is suitable for scenarios where there is a high degree of trust between the end-user and the client application. This could be a Desktop application, shell script, or server-to-server communication where user authorization is needed. This flow is described in RFC-6749.
1. Your application requests authorization¶
curl -sku "Authorization: Basic Qt3c...Rm1y="
-d grant_type=password
-d username=rjohnson
-d password=password
-d scope=PRODUCTION
https://api.tacc.utexas.edu/token
The response would look something like this:
{
"access_token": "3Dsr...pv21",
"expires_in": 14400,
"refresh_token": "dyVa...MqR0",
"token_type": "bearer"
}
The request is sent to the /token
endpoint of the Tapis Authentication service. The request will include the following parameters in the request body:
Request body parameter | Value |
---|---|
Grant_type | Required. Set it to "refresh_token" |
username | Required. The username of an active API user |
password | Required. The password of an active API user |
scope | Required. A space-separated list of scopes. Currently only PRODUCTION is supported |
The header of this POST request must contain the following parameter:
Header parameter | Value |
---|---|
Authorization | Required.Set it to “refresh_token”Required. Base 64 encoded string that contains the client ID and client secret key. The field must have the format: Authorization: Basic encoded client_id:client_secret> . (This can also be achieved with curl using the `-u option and specifying the raw colon separated client_id and client_secret.</i>`) |
https://example.com/callback?error=access_denied
If the user has not accepted your request or an error has occurred, the response query string contains an error parameter indicating the error that occurred during login. For example:
2. Use the access token to access the Tapis REST APIs¶
curl -H "Authorization: Bearer 3Dsr...pv21"
https://api.tacc.utexas.edu/profiles/v2/me?pretty=true
The response would look something like this:
{
"create_time": "20140905072223Z",
"email": "rjohnson@mlb.com",
"first_name": "Randy",
"full_name": "Randy Johnson",
"last_name": "Johnson",
"mobile_phone": "(123) 456-7890",
"phone": "(123) 456-7890",
"status": "Active",
"uid": 0,
"username": "rjohnson"
}
The access token allows you to make requests to any of the Tapis REST APIs on behalf of the authenticated user.
3. Requesting access token from refresh token¶
curl -sku "Authorization: Basic Qt3c...Rm1y="
-d grant_type=refresh_token
-d refresh_token=dyVa...MqR0
-d scope=PRODUCTION
https://api.tacc.utexas.edu/token
The response would look something like this:
{
"access_token": "8erF...NGly",
"expires_in": 14400,
"token_type": "bearer"
}
Access tokens are deliberately set to expire after a short time, usually 4 hours, after which new tokens may be granted by supplying the refresh token obtained during original request.
The request is sent to the token endpoint of the Tapis Authorization service. The body of this POST request must contain the following parameters:
Request body parameter | Value |
---|---|
grant_type | Required. Set it to "refresh_token". refresh_token |
refresh_token | Required. The refresh token returned from the authorization code exchange. |
scope | Required. A space-separated list of scopes. Required. Currently only PRODUCTION is supported. |
Client Credentials¶
The method is suitable for authenticating your requests to the Tapis REST API. This flow is described in RFC-6749.
1. Your application requests authorization¶
curl -sku "Authorization: Basic Qt3c...Rm1y="
-d grant_type=client_credentials
-d scope=PRODUCTION
https://api.tacc.utexas.edu/token
The response would look something like this:
{
"access_token": "61e6...Mc96",
"expires_in": 14400,
"token_type": "bearer"
}
The request is sent to the /token
endpoint of the Tapis Authentication service. The request must include the following parameters in the request body:
Request body parameter | Value |
---|---|
grant_type | Required. Set it to "client_credentials". |
scope | Optional. A space-separated list of scopes. Currently on PRODUCTION is supported. |
2. Use the access token to access the Tapis REST APIs¶
curl -H "Authorization: Bearer 61e6...Mc96"
https://api.tacc.utexas.edu/profiles/v2/me
The response would look something like this:
{
"email": "nryan@mlb.com",
"firstName" : "Nolan",
"lastName" : "Ryan",
"position" : "null",
"institution" : "Houston Astros",
"phone": "(123) 456-7890",
"fax" : null,
"researchArea" : null,
"department" : null,
"city" : "Houston",
"state" : "TX",
"country" : "USA",
"gender" : "M",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"users" : {
"href" : "https://api.tacc.utexas.edu/profiles/v2/nryan/users"
}
}
}
The access token allows you to make requests to any of the Tapis REST APIs on behalf of the authenticated user.
Clients and API Keys¶
By now you already have a user account. Your user account identifies you to the web applications you interact with. A username and password is sufficient for interacting with an application because the application has a user interface, so it knows that the authenticated user is the same one interacting with it. The Tapis API does not have a user interface, so simply providing it a username and password is not sufficient. Tapis needs to know both the user on whose behalf it is acting as well as the client application that is making the call. Whereas every person has a single user account, they may leverage multiple services to do their daily work.
In different types of Tapis interactions, the user is the same, but the context with which they interact with the Tapis is different. Further, the different Tapis interactions all involve client applications developed by the same organization. The situation is further complicated when one or more 3rd party client applications are used to leverage the infrastructure. Tapis needs to track both the users and client applications with whom it interacts. It does this through the issuance of API keys.
Tapis uses OAuth2 to authenticate users and make authorization decisions about what APIs client applications have permission to access. A discussion of OAuth2 is out of the context of this tutorial. You can read more about it on the OAuth2 website or from the websites of any of the many other service providers using it today. In this section, we will walk you through getting your API keys so we can stay focused on learning how to interact with the Tapis’ (Tapis) APIs.
Creating a new client application¶
In order to interact with any of the Tapis APIs, you will need to first get a set of API keys. You can get your API keys from the Clients service. The example below shows how to get your API keys using both curl and the Tapis CLI.
curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "clientName=my_cli_app" -d "description=Client app used for scripting up cool stuff" https://api.tacc.utexas.edu/clients/v2
Note: the -S option will store the new API keys for future use so you don’t need to manually enter then when you authenticate later.
The response to this call will look something like:
{
"callbackUrl":"",
"key":"gTgp...SV8a",
"secret":"hZ_z3f...BOD6",
"description":"Client app used for scripting up cool stuff",
"name":"my_cli_app",
"tier":"Unlimited",
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app"
},
"subscriber":{
"href":"https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"subscriptions":{
"href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions/"
}
}
}
Your API keys should be kept in a secure place and not shared with others. This will prevent other, unauthorized client applications from impersonating your application. If you are developing a web application, you should also provide a valid callbackUrl when creating your keys. This will reduce the risk of your keys being reused even if they are compromised. You should also create a unique set of API keys for each client application you develop. This will allow you to better monitor your usage on a client application-to-application basis and reduce the possibility of inadvertently hitting usage quotas due to cumulative usage across client applications.
Listing your existing client applications¶
curl -sku "$API_USERNAME:$API_PASSWORD" https://api.tacc.utexas.edu/clients/v2
The response to this call will look something like:
[
{
"callbackUrl":"",
"key":"xn8b...0y3d",
"description":"",
"name":"DefaultApplication",
"tier":"Unlimited",
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/clients/v2/DefaultApplication"
},
"subscriber":{
"href":"https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"subscriptions":{
"href":"https://api.tacc.utexas.edu/clients/v2/DefaultApplication/subscriptions/"
}
}
},
{
"callbackUrl":"",
"key":"gTgp...SV8a",
"description":"Client app used for scripting up cool stuff",
"name":"my_cli_app",
"tier":"Unlimited",
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app"
},
"subscriber":{
"href":"https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"subscriptions":{
"href":"https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions/"
}
}
}
]
Over time you may develop several client applications. Managing several sets of API keys can become tricky. You can see which applications you have created by querying the Clients service.
Deleting client registrations¶
curl -sku "$API_USERNAME:$API_PASSWORD" -X DELETE https://api.tacc.utexas.edu/clients/v2/my_cli_app
The response to this call is simply a null result object.
At some point you may need to delete a client. You can do this by requesting a DELETE on your client in the Clients service.
Listing current subscriptions¶
curl -sku "$API_USERNAME:$API_PASSWORD" https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions
The response to this call will look something like:
[
{
"context":"/apps",
"name":"Apps",
"provider":"admin",
"status":"PUBLISHED",
"version":"v2",
"tier":"Unlimited",
"_links":{
"api":{
"href":"https://api.tacc.utexas.edu/apps/v2/"
},
"client":{
"href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client"
},
"self":{
"href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client/subscriptions/"
}
}
},
{
"context":"/files",
"name":"Files",
"provider":"admin",
"status":"PUBLISHED",
"version":"v2",
"tier":"Unlimited"
"_links":{
"api":{
"href":"https://api.tacc.utexas.edu/files/v2/"
},
"client":{
"href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client"
},
"self":{
"href":"https://api.tacc.utexas.edu/clients/v2/systest_test_client/subscriptions/"
}
}
},
...
]
When you register a new client application and get your API keys, you are given access to all the Tapis APIs by default. You can see the APIs you have access to by querying the subscriptions collection of your client.
Updating client subscriptions¶
curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "name=transforms" https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions
You can also use a wildcard to resubscribe to all active APIs.
curl -sku "$API_USERNAME:$API_PASSWORD" -X POST -d "name=*" https://api.tacc.utexas.edu/clients/v2/my_cli_app/subscriptions
The response to this call will be a JSON array identical to the one returned when listing your subscriptions.
Over time, new APIs will be deployed. When this happens you will need to subscribe to the new APIs. You can do this by POSTing a request to the subscription collection with the information about the new API.
Systems¶
A system in Tapis represents a server or collection of servers. A server can be physical, virtual, or a collection of servers exposed through a single hostname or ip address. Systems are identified and referenced in Tapis by a unique ID unrelated to their ip address or hostname. Because of this, a single physical system may be registered multiple times. This allows different users to configure and use a system in whatever way they need to for their specific needs.
Systems come in two flavors: storage and execution. Storage systems are only used for storing and interacting with data. Execution systems are used for running apps (aka jobs or batch jobs) as well as storing and interacting with data.
The Systems service gives you the ability to add and discover storage and compute resources for use in the rest of the API. You may add as many or as few storage systems as you need to power your digital lab. When you register a system, it is private to you and you alone. Systems can also be published into the public space for all users to use. Depending on who is administering Tapis for your organization, this may have already happened and you may already have one or more storage systems available to you by default.
In this tutorial we walk you through how to discovery, manage, share, and configure systems for your specific needs. This tutorial is best done in a hands-on manner, so if you do not have a compute or storage system of your own to use, you can grab a VM from our sandbox.
Discovering systems¶
tapis systems list -v
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/
The response will be something like this:
[
{
"id" : "user.storage",
"name" : "Storage VM for the drug discovery portal",
"type" : "STORAGE",
"default" : false,
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/user.storage"
}
},
"available": null,
"description" : "SFTP on drugdiscovery for the drug discovery portal",
"public" : true,
"status" : "UP",
},
{
"id" : "docker.tacc.utexas.edu",
"name" : "Demo Docker VM",
"type" : "EXECUTION",
"default" : false,
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/docker.tacc.utexas.edu"
}
},
"available": null,
"description" : "Cloud VM used for Docker demonstrations and tutorials.",
"public" : true,
"status" : "UP"
}
]
The Systems service allows you to list and search for systems you have registered and systems that have been shared with you. To get a list of all your systems, make a GET request on the Systems collection.
System description can get rather verbose, so a summary object is returned when listing a resource collection. The summary object contains the most critical fields in order to reduce response size when retrieving a user’s systems. You can customize this behavior using the filter
query parameter.
The above response my vary depending on who administers Tapis for your organization. To customize this tutorial for your specific account, login.
Filtering results¶
List all systems (up to the page limit)
tapis systems search -v --type eq STORAGE
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?type=storage
List only execution systems
tapis systems search -v --type eq EXECUTION
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?type=execution
List only public systems
tapis systems search --public eq TRUE
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?publicOnly=true
List only private systems
tapis systems search --public eq FALSE
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?privateOnly=true
Only return default systems
tapis systems search --default eq TRUE
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/?default=true
You can further filter the results by type, scope, and default status. See the search section for further filtering options.
System details¶
tapis systems show -v hpc-tacc-jetstream
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/systems/v2/user.storage
The response will be something like this:
{
"id": "hpc-tacc-jetstream",
"name": "TACC Jetstream (Docker Host)",
"type": "EXECUTION",
"default": false,
"_links": {
"metadata": {
"href": "https://api.sd2e.org/meta/v2/data/?q=%7B%22associationIds%22%3A%228014294480571067929-242ac11a-0001-006%22%7D"
},
"roles": {
"href": "https://api.sd2e.org/systems/v2/hpc-tacc-jetstream/roles"
},
"self": {
"href": "https://api.sd2e.org/systems/v2/hpc-tacc-jetstream"
},
"history": {
"href": "https://api.sd2e.org/systems/v2/hpc-tacc-jetstream/history"
}
},
"available": true,
"description": "Linux container support via Docker 17.12.1-ce",
"environment": null,
"executionType": "CLI",
"globalDefault": false,
"lastModified": "2019-09-11T12:49:47.000-05:00",
"login": {
"proxy": null,
"protocol": "SSH",
"port": 22,
"auth": {
"type": "SSHKEYS"
},
"host": "129.114.17.137"
},
"maxSystemJobs": 10,
"maxSystemJobsPerUser": 10,
"owner": "sd2eadm",
"public": true,
"queues": [
{
"maxJobs": 128,
"maxMemoryPerNode": 1,
"default": false,
"maxRequestedTime": "00:15:00",
"name": "short",
"description": "Rapid turnaround jobs",
"maxNodes": 1,
"maxProcessorsPerNode": 1,
"mappedName": null,
"maxUserJobs": 10,
"customDirectives": "-A SD2E-Community"
},
],
"revision": 20,
"scheduler": "FORK",
"scratchDir": "",
"site": "jetstream-cloud.org",
"status": "UP",
"storage": {
"proxy": null,
"protocol": "SFTP",
"mirror": false,
"port": 22,
"auth": {
"type": "SSHKEYS"
},
"host": "129.114.17.137",
"rootDir": "/data/jobs",
"homeDir": "/"
},
"uuid": "8014294480571067929-242ac11a-0001-006",
"workDir": ""
}
To query for detailed information about a specific system, add the system id to the url and make another GET request.
This time, the response will be a JSON object with a full system description. The following is the description of a storage system. In the next section we talk more about storage systems and how to register one of your own.
Storage systems¶
A storage systems can be thought of as an individual data repository that you want to access through Tapis. The following JSON object shows how a basic storage systems is described.
{
"id":"sftp.storage.example.com",
"name":"Example SFTP Storage System",
"type":"STORAGE",
"description":"My example storage system using SFTP to store data for testing",
"storage":{
"host":"storage.example.com",
"port":22,
"protocol":"SFTP",
"rootDir":"/",
"homeDir":"/home/systest",
"auth":{
"username":"systest",
"password":"changeit",
"type":"PASSWORD"
}
}
}
The first four attributes are common to both storage and execution systems. The storage
attribute describes the connectivity and authentication information needed to connect to the remote system. Here we describe a SFTP server accessible on port
22 at host
storage.example.com. We specify that we want the rootDir
, or virtual system root exposed through Tapis, to be the system’s physical root directory, and we want the authenticated user’s home directory to be the homeDir
, or virtual home directory and base of all relative paths given to Tapis. Finally, we tell Tapis to use password based authentication and provided the necessary credentials.
This example is given as a simple illustration of how to describe a systems for use by Tapis. In most situations you should NOT provide your username and password. In fact, if you are using a compute or storage systems from your university or government-funded labs it is, at best, against the user agreement and, at worst, illegal to give your password to a third party service such as Tapis. In these situations, use one of the many other authentication options such as SSH keys, X509 authentication, or a 3rd party authentication service like the MyProxy Gateway.
The full list of storage system attributes is described in the following table.
Attribute | Type | Description |
---|---|---|
available | boolean | Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the `status` attribute in that a system may be UP, but not available for use in Tapis. Defaults to true |
description | string | Verbose description of this system. |
id | string | Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted. |
name | string | Required: Common display name for this system. |
site | string | The site associated with this system. Primarily for logical grouping. |
status | UP, DOWN, MAINTENANCE, UNKNOWN | The functional status of the system. Systems must be in UP status to be used. |
storage | JSON Object | Required: Storage configuration describing the storage config defining how to connect to this system for data staging. |
type | STORAGE, EXECUTION | Required: Must be STORAGE. |
Supported data and authentication protocols¶
The example above described a system accessible by SFTP. Tapis supports many different data and authentication protocols for interacting with your data. Sample configurations for many protocol combinations are given below.
Sample storage system definition with each supported data protocol and authentication configuration.
{ "id":"sftp.storage.example.com", "name":"Example SFTP Storage System", "status":"UP", "type":"STORAGE", "description":"My example storage system using SFTP to store data for testing", "site":"example.com", "storage":{ "host":"storage.example.com", "port":22, "protocol":"SFTP", "rootDir":"/", "homeDir":"/home/systest", "auth":{ "username":"systest", "password":"changeit", "type":"PASSWORD" } } }
In each of the examples above, the storage
objects were slightly different, each unique to the protocol used. Descriptions of every attribute in the storage
> object and its children are given in the following tables.
storage
attributes give basic connectivity information describing things like how to connect to the system and on what port.
Attribute | Type | Description |
---|---|---|
auth | JSON object | Required: A JSON object describing the default authentication credential for this system. |
container | string | The container to use when interacting with an object store. Specifying a container provides isolation when exposing your cloud storage accounts so users do not have access to your entire storage account. This should be used in combination with delegated cloud credentials such as an AWS IAM user credential. |
homeDir | string | The path on the remote system, relative to rootDir to use as the virtual home directory for all API requests. This will be the base of any requested paths that do not being with a '/'. Defaults to '/', thus being equivalent to rootDir . |
host | string | Required: The hostname or ip address of the storage server |
port | int | Required: The port number of the storage server. |
mirror | boolean | Whether the permissions set on the server should be pushed to the storage system itself. Currently, this only applies to IRODS systems. |
protocol | FTP, GRIDFTP, IRODS, IRODS4, LOCAL, S3, SFTP | Required: The protocol used to authenticate to the storage server. |
publicAppsDir | string | The path on the remote system where apps will be stored if this system is used as the default public storage system. |
proxy | JSON Object | The proxy server through with Tapis will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server. |
resource | string | The name of the default resource to use when defining an IRODS system. |
rootDir | string | The path on the remote system to use as the virtual root directory for all API requests. Defaults to '/'. |
zone | string | The name of the default zone to use when defining an IRODS system. |
storage.auth
attributes give authentication information describing how to authenticate to the system specified in the storage
config above.
Attribute | Type | Description |
---|---|---|
credential | string | The credential used to authenticate to the remote system. Depending on the authentication protocol of the remote system, this could be an OAuth Token, X.509 certificate. |
internalUsername | string | The username of the internal user associated with this credential. |
password | string | The password on the remote system used to authenticate. |
privateKey | string | The private ssh key used to authenticate to the remote system. |
publicKey | string | The public ssh key used to authenticate to the remote system. |
server | JSON object | A JSON object describing the authentication server from which a valid credential may be obtained. Currently only auth type X509 supports this attribute. |
type | APIKEYS, LOCAL, PAM, PASSWORD, SSHKEYS, or X509 | Required: The path on the remote system where apps will be stored if this system is used as the default public storage system. |
username | string | The remote username used to authenticate. |
storage.auth.server
attributes give information about how to obtain a credential that can be used in the authentication process. Currently only systems using the X509 authentication can leverage this feature to communicate with MyProxy and MyProxy Gateway servers.
Attribute | Type | Description |
---|---|---|
name | string | A descriptive name given to the credential server |
endpoint | string | Required: The endpoint of the authentication server. |
port | integer | Required: The port on which to connect to the server. |
protocol | MPG, MYPROXY | Required: The protocol with which to obtain an authentication credential. |
system.proxy
configuration attributes give information about how to connect to a remote system through a proxy server. This often happens when the target system is behind a firewall or resides on a NAT. Currently proxy servers can only reuse the authentication configuration provided by the target system.
Attribute | Type | Description |
---|---|---|
name | string | Required: A descriptive name given to the proxy server. |
host | string | Required: The hostname of the proxy server. |
port | integer | Required: The port on which to connect to the proxy server. If null, the port in the parent storage config is used. |
If you have not yet set up a system of your own, now is a good time to grab a sandbox system to use while you follow along with the rest of this tutorial.
Creating a new storage system¶
tapis systems create -v -F sftp-password.json
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@sftp-password.json" https://api.tacc.utexas.edu/systems/v2
The response from the service will be similar to the following:
{
"site": null,
"id": "sftp.storage.example.com",
"revision": 1,
"default": false,
"lastModified": "2016-09-06T17:46:42.621-05:00",
"status": "UP",
"description": "My example storage system using SFTP to store data for testing",
"name": "Example SFTP Storage System",
"owner": "nryan",
"globalDefault": false,
"available": true,
"uuid": "4036169328045649434-242ac117-0001-006",
"public": false,
"type": "STORAGE",
"storage": {
"mirror": false,
"port": 22,
"homeDir": "/home/systest",
"protocol": "SFTP",
"host": "storage.example.com",
"publicAppsDir": null,
"proxy": null,
"rootDir": "/",
"auth": {
"type": "PASSWORD"
}
},
"_links": {
"roles": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"credentials": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
},
"self": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
},
"metadata": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
}
}
}
Congratulations, you just added your first system. This storage system can now be used by the Files service to manage data, the Transfer service as a source or destination of data movement, the Apps service as a application repository, and the Jobs Service as both a staging and archiving destination.
Notice that the JSON returned from the Systems service is different than what was submitted. Several fields have been added, and several other have been removed. On line 3, the UUID of the system has been added. This is the same UUID that is used in notifications and metadata references. On line 5, the status
value was added in and assigned a default value since we did not specify it. Ditto for the site
attribute on line 8.
Three new fields were added on lines 9-11. revision
is the number of times this system has been updated. This being our first time registering the system, it is set to 1. public
tells whether this system is published as a shared resource for all users. We will cover this more in the section on System scope. lastModified
is a timestamp of the last time the system was updated.
In the storage
object, the publicAppsDir
and mirror
fields were both added and set to their default values. In this example we are not using a proxy
server, so it was defaulted to null. Last, and most important, all authentication information has been omitted from the response object. Regardless of the authentication type, no user credential information will ever be returned once they are stored.
Execution Systems¶
In contrast to storage systems, execution systems specify compute resources where application binaries can be run. In addition to the storage
attribute found in storage systems, execution systems also have a login
attribute describing how to connect to the remote system to submit jobs as well as several other attributes that allow Tapis to determine how to stage data and run software on the system. The full list of execution system attributes is given in the following tables.
Name | Type | Description |
---|---|---|
available | boolean | Whether the system is currently available for use in the API. Unavailable systems will not be visible to anyone but the owner. This differs from the status attribute in that a system may be UP, but not available for use in Tapis. Defaults to true |
description | string | Verbose description of this system. |
environment | String | List of key-value pairs that will be added to the environment prior to execution of any command. |
executionType | HPC, Condor, CLI | Required: Specifies how jobs should go into the system. HPC and Condor will leverage a batch scheduler. CLI will fork processes. |
id | string | Required: A unique identifier you assign to the system. A system id must be globally unique across a tenant and cannot be reused once deleted. |
maxSystemJobs | integer | Maximum number of jobs that can be queued or running on a system across all queues at a given time. Defaults to unlimited. |
maxSystemJobsPerUser | integer | Maximum number of jobs that can be queued or running on a system for an individual user across all queues at a given time. Defaults to unlimited. |
name | string | Required: Common display name for this system. |
queues | JSON Array | An array of batch queue definitions providing descriptive and quota information about the queues you want to expose on your system. If not specified, no other system queues will be available to jobs submitted using this system. |
scheduler | LSF, LOADLEVELER, PBS, SGE, CONDOR, FORK, COBALT, TORQUE, MOAB, SLURM, CUSTOM_LSF, CUSTOM_LOADLEVELER, CUSTOM_PBS, CUSTOM_SGE, CUSTOM_CONDOR, FORK, CUSTOM_COBALT, CUSTOM_TORQUE, CUSTOM_MOAB, CUSTOM_SLURM, UNKNOWN | Required: The type of batch scheduler available on the system. This only applies to systems with executionType HPC and CONDOR. The *_CUSTOM version of each scheduler provides a mechanism for you to override the default scheduler directives added by Tapis and explicitly add your own through the customDirectives field in each of the batchQueue definitions for your system. |
scratchDir | string | Path to use for a job scratch directory. This value is the first choice for creating a job's working directory at runtime. The path will be resolved relative to the rootDir value in the storage config if it begins with a "/", and relative to the system homeDir otherwise. |
site | string | The site associated with this system. Primarily for logical grouping. |
startupScript | String | Path to a script that will be run prior to execution of any command on this system. The path will be a standard path on the remote system. A limited set of system macros are supported in this field. They are rootDir, homeDir, systemId, workDir, and homeDir. The standard set of runtime job attributes are also supported. Between the two set of macros, you should be able to construct distinct paths per job, user, and app. Any environment variables defined in the system description will be added after this script is sourced. If this script fails, output will be logged to the .agave.log file in your job directory. Job submission will still continue regardless of the exit code of the script. |
status | UP, DOWN, MAINTENANCE, UNKNOWN | The functional status of the system. Systems must be in UP status to be used. |
storage | JSON Object | Required: Storage configuration describing the storage config defining how to connect to this system for data staging. |
type | STORAGE, EXECUTION | Required: Must be EXECUTION. |
workDir | string | Path to use for a job working directory. This value will be used if no scratchDir is given. The path will be resolved relative to the rootDir value in the storage config if it begins with a "/", and relative to the system homeDir otherwise. |
Startup startupScript¶
Every time Tapis establishes a connection to an execution system, local or remote, it will attempt to source the startupScript
provided in your system definition. The value of startupScript
may be an absolute path on the system (ie. “/usr/local/bin/common_aliases.sh”, “/home/nryan/.bashrc”, etc.) or a path relative to physical home directory of the account used to authenticate to the system (“.bashrc”, “.profile”, “agave/scripts/startup.sh”, etc).
The startupScript
field supports the use of template variables which Tapis will resolve at runtime before establishing a connection. If you would prefer to specify the startup script as a virtualized path on the system, prepend ${SYSTEM_ROOT_DIR}
to the path. If the system will be made public, you can specify a file relative to the home directory of the calling user by prefixing your startupScript
value with ${SYSTEM_ROOT_DIR}/${SYSTEM_HOME_DIR}/${USERNAME}
A full list of the variables available is given in the following table.
Variable | Description | ||
---|---|---|---|
SYSTEM_ID | ID of the system (ex. ssh.execute.example.com) | ||
SYSTEM_UUID | fThe UUID of the system | ||
SYSTEM_STORAGE_PROTOCOL | The protocol used to move data to and from this system | ||
SYSTEM_STORAGE_HOST | The storage host for this sytem | ||
SYSTEM_STORAGE_PORT | The storage port for this system | ||
SYSTEM_STORAGE_RESOURCE | The system resource for iRODS systems | ||
SYSTEM_STORAGE_ZONE | The system zone for iRODS systems | ||
SYSTEM_STORAGE_ROOTDIR | The virtual root directory exposed on this system | ||
SYSTEM_STORAGE_HOMEDIR | The home directory on this system relative to the STORAGE_ROOT_DIR | ||
SYSTEM_STORAGE_AUTH_TYPE | The storage authentication method for this system | ||
SYSTEM_STORAGE_CONTAINER | The the object store bucket in which the rootDir resides. | ||
SYSTEM_LOGIN_PROTOCOL | The protocol used to establish a session with this system (eg SSH, GSISSH, etc *NOTE: OpenSSH Keys are not supported. ) | ||
SYSTEM_LOGIN_HOST | The login host for this system | ||
SYSTEM_LOGIN_PORT | The login port for this system | ||
SYSTEM_LOGIN_AUTH_TYPE | The login authentication method for this system | ||
SYSTEM_OWNER | The username of the user who created the system. |
executionType |
scheduler |
Description |
---|---|---|
HPC | LSF, LOADLEVELER, PBS, SGE, COBALT, TORQUE, MOAB, SLURM | Jobs will be submitted to the local scheduler using the appropriate scheduler commands. Systems with this execution type will not allow forked jobs. |
CONDOR | CONDOR | Jobs will be submitted to the condor scheduler running locally on the remote system. Tapis will not do any installation for you, so the setup and administration of the Condor server is up to you. |
CLI | FORK | Jobs will be started as a forked process and monitored using the system process id. |
When you are describing your system, consider the policies put in place by your system administrators. If the system you are defining has a scheduler, chances are they want you to use it.
Defining batch queues¶
Tapis supports the notion of multiple submit queues. On HPC systems, queues should map to actual batch scheduler queues on the target server. Additionally, queues are used by Tapis as a mechanism for implementing quotas on job throughput in a given queue or across an entire system. Queues are defined as a JSON array of objects assigned to the queues
attribute. The following table summarizes all supported queue parameters.
Name | Type | Description |
---|---|---|
name | string | Arbitrary name for the queue. This will be used in the job submission process, so it should line up with the name of an actual queue on the execution system. |
maxJobs | integer | Maximum number of jobs that can be queued or running within this queue at a given time. Defaults to 10. -1 for no limit |
maxUserJobs | integer | Maximum number of jobs that can be queued or running by any single user within this queue at a given time. Defaults to 10. -1 for no limit |
maxNodes | integer | Maximum number of nodes that can be requested for any job in this queue. -1 for no limit |
maxProcessorsPerNode | integer | Maximum number of processors per node that can be requested for any job in this queue. -1 for no limit |
maxMemoryPerNode | string | Maximum memory per node for jobs submitted to this queue in ###.#[E|P|T|G]B format. |
maxRequestedTime | string | Maximum run time for any job in this queue given in hh:mm:ss format. |
customDirectives | string | Arbitrary text that will be appended to the end of the scheduler directives in a batch submit script. This could include a project number, system-specific directives, etc. |
default | boolean | True if this is the default queue for the system, false otherwise. |
Configuring quotas¶
In the batch queues table above, several attributes exist to specify limits on the number of total jobs and user jobs in a given queue. Corresponding attributes exist in the execution system to specify limits on the number of total and user jobs across an entire system. These attributes, when used appropriately, can be used to tell Tapis how to enforce limits on the concurrent activity of any given user. They can also ensure that Tapis will not unfairly monopolize your systems as your application usage grows.
If you have ever used a shared HPC system before, you should be familiar with batch queue quotas. If not, the important thing to understand is that they are a critical tool to ensure fair usage of any shared resource. As the owner/administrator for your registered system, you can use the batch queues you define to enforce whatever usage policy you deem appropriate.
Consider one example where you are using a VM to run image analysis routines on demand through Tapis, your server will become memory bound and experience performance degradation if too many processes are running at once. To avoid this, you can set a limit using a batch queue configuration that limits the number of simultaneous tasks that can run at once on your server.
Another example where quotas can be helpful is to help you properly partitioning your system resources. Consider a user analyzing unstructured data. The problem is computationally and memory intensive. To preserve resources, you could create one queue with a moderate value of maxJobs
and conservative maxMemoryPerNode
, maxProcessorsPerNode
, and maxNodes
values to allow good throughput of small job. You could then create another queue with large maxMemoryPerNode
, maxProcessorsPerNode
, and maxNodes
values while only allowing a single job to run at a time. This gives you both high throughput and high capacity on a single system.
The following sample queue definitions illustrate some other interesting use cases.
{ "name":"short_job", "mappedName": null, "maxJobs":100, "maxUserJobs":10, "maxNodes":32, "maxMemoryPerNode":"64GB", "maxProcessorsPerNode":12, "maxRequestedTime":"00:15:00", "customDirectives":null, "default":true }
System login protocols¶
As with storage systems, Tapis supports several different protocols and mechanisms for job submission. We already covered scheduler and queue support. Here we illustrate the different login configurations possible. For brevity, only the value of the login
JSON object is shown.
{ "host": "execute.example.com", "port": 22, "protocol": "SSH", "auth": { "username": "systest", "password": "changeit", "type": "PASSWORD" } }
The full list of login configuration options is given in the following table. We omit the login.auth
and login.proxy
attributes as they are identical to those used in the storage config.
Attribute | Type | Description |
---|---|---|
auth | JSON object | Required: A JSON object describing the default login authentication credential for this system. |
host | string | Required: The hostname or ip address of the server where the job will be submitted. |
port | int | The port number of the server where the job will be submitted. Defaults to the default port of the protocol used. |
protocol | SSH, GSISSH, LOCAL | Required: The protocol used to submit jobs for execution. *NOTE: OpenSSH Keys are not supported. |
proxy | JSON Object | The proxy server through with Tapis will tunnel when submitting jobs. Currently proxy servers will use the same authentication mechanism as the target server. |
Scratch and work directories¶
In the Job Management tutorial we will dive into how Tapis manages the end-to-end lifecycle of running a job. Here we point out two relevant attributes that control where data is staged and where your job will physically run. The scratchDir
and workDir
attributes control where the working directories for each job will be created on an execution system. The following table summarizes the decision making process Tapis uses to determine where the working directories should be created.
rootDir value |
homeDir value |
scratchDir value |
Effective system path for job working directories |
---|---|---|---|
/ | / | — | / |
/ | / | / | / |
/ | / | /scratch | /scratch |
/ | /home/nryan | — | /home/nryan |
/ | /home/nryan | / | / |
/ | /home/nryan | /scratch | /scratch |
/home/nryan | / | — | /home/nryan |
/home/nryan | / | / | /home/nryan |
/home/nryan | / | /scratch | /home/nryan/scratch |
/home/nryan | /home | — | /home/nryan/home |
/home/nryan | /home | / | /home/nryan |
/home/nryan | /home | /scratch | /home/nryan/scratch |
While it is not required, it is a best practice to always specify scratchDir
and workDir
values for your execution systems and, whenever possible, place them outside of the system homeDir
to ensure data privacy. The reason for this is that the file system available on many servers is actually made up of a combination of physically attached storage, mounted volumes, and network mounts. Often times, your home directory will have a very conservative quota while the mounted storage will essentially be quota free. As the above table shows, when you do not specify a scratchDir
or workDir
, Tapis will attempt to create your job work directories in your system homeDir
. It is very likely that, in the course of running simulations, you will reach the quota on your home directory, thereby causing that job and all future jobs to fail on the system until you clear up more space. To avoid this, we recommend specifying a location with sufficient available space to handle the work you want to do.
Another common error that arises from not specifying thoughtful scratchDir
and workDir
values for your execution systems is jobs failing due to “permission denied” errors. This often happens when your scratchDir
and/or workDir
resolve to the actual system root. Usually the account you are using to access the system will not have permission to write to /
, so all attempts to create a job working directory fail, accurately, due to a “permission denied” error.
While it is not required, it is a best practice to always specify scratchDir
and workDir
values for your execution systems and, whenever possible, place them outside of the system homeDir
to ensure data privacy.
Creating a new execution system¶
tapis systems create -v -F ssh-password.json
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -F "fileToUpload=@ssh-password.json" https://api.tacc.utexas.edu/systems/v2
The response from the server will be similar to the following
{
"id":"demo.execute.example.com",
"uuid":"0001323106792914-5056a550b8-0001-006",
"name":"Example SSH Execution Host",
"status":"UP",
"type":"EXECUTION",
"description":"My example system using ssh to submit jobs used for testing.",
"site":"example.com",
"revision":1,
"public":false,
"lastModified":"2013-07-02T10:16:11.000-05:00",
"executionType":"HPC",
"scheduler":"SGE",
"environment":null,
"startupScript":"./bashrc",
"maxSystemJobs":100,
"maxSystemJobsPerUser":10,
"workDir":"/work",
"scratchDir":"/scratch",
"queues":[
{
"name":"normal",
"maxJobs":100,
"maxUserJobs":10,
"maxNodes":32,
"maxMemoryPerNode":"64GB",
"maxProcessorsPerNode":12,
"maxRequestedTime":"48:00:00",
"customDirectives":null,
"default":true
},
{
"name":"largemem",
"maxJobs":25,
"maxUserJobs":5,
"maxNodes":16,
"maxMemoryPerNode":"2TB",
"maxProcessorsPerNode":4,
"maxRequestedTime":"96:00:00",
"customDirectives":null,
"default":false
}
],
"login":{
"host":"texas.rangers.mlb.com",
"port":22,
"protocol":"SSH",
"proxy":null,
"auth":{
"type":"PASSWORD"
}
},
"storage":{
"host":"texas.rangers.mlb.com",
"port":22,
"protocol":"SFTP",
"rootDir":"/home/nryan",
"homeDir":"",
"proxy":null,
"auth":{
"type":"PASSWORD"
}
}
}
Disabling¶
Disable a system
tapis systems disable $SYSTEM_ID
curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
-H "Content-Type: application/json"
-X PUT --data-binary '{"action": "disable"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response will look something like the following:
{
"site": null,
"id": "sftp.storage.example.com",
"revision": 1,
"default": false,
"lastModified": "2016-09-06T17:46:42.621-05:00",
"status": "UP",
"description": "My example storage system using SFTP to store data for testing",
"name": "Example SFTP Storage System",
"owner": "nryan",
"globalDefault": false,
"available": false,
"uuid": "4036169328045649434-242ac117-0001-006",
"public": false,
"type": "STORAGE",
"storage": {
"mirror": false,
"port": 22,
"homeDir": "/home/systest",
"protocol": "SFTP",
"host": "storage.example.com",
"publicAppsDir": null,
"proxy": null,
"rootDir": "/",
"auth": {
"type": "PASSWORD"
}
},
"_links": {
"roles": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"credentials": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
},
"self": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
},
"metadata": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
}
}
}
There may be times when you need to disable a system. If your system has scheduled maintenance periods, you may want to disable the system until the maintenance period ends. You can do this by making a PUT request on a monitor with the a field name action
set to “disabled”, or simply updating the status to “MAINTENANCE”. While disabled, all apps and jobs will be disabled. All file operations will be rejected during system downtimes as well. Once restored, all operations will pick back up.
Enabling a system¶
Enable a system
tapis systems enable $SYSTEM_ID
curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
-H "Content-Type: application/json"
-X PUT --data-binary '{"action": "enable"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response will look something like the following:
{
"site": null,
"id": "sftp.storage.example.com",
"revision": 1,
"default": false,
"lastModified": "2016-09-06T17:46:42.621-05:00",
"status": "UP",
"description": "My example storage system using SFTP to store data for testing",
"name": "Example SFTP Storage System",
"owner": "nryan",
"globalDefault": false,
"available": true,
"uuid": "4036169328045649434-242ac117-0001-006",
"public": false,
"type": "STORAGE",
"storage": {
"mirror": false,
"port": 22,
"homeDir": "/home/systest",
"protocol": "SFTP",
"host": "storage.example.com",
"publicAppsDir": null,
"proxy": null,
"rootDir": "/",
"auth": {
"type": "PASSWORD"
}
},
"_links": {
"roles": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/roles"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"credentials": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com/credentials"
},
"self": {
"href": "https://api.tacc.utexas.edu/systems/v2/sftp.storage.example.com"
},
"metadata": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%224036169328045649434-242ac117-0001-006%22%7D"
}
}
}
Similarly, to enable a monitor, make a PUT request with the a field name action
set to “enabled”. Once reenabled, the monitor will resume its previous check schedule as specified in the nextUpdate
field, or immediately if that time has already expired.
Deleting systems¶
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The call will return an empty result.
In the event you wish to delete a system, you can make a DELETE request on the system URL. Deleting a system will disable the system and all applications published on that system from use. Any running jobs will be continue to run, but all pending, archiving, paused, and staged jobs will be killed, and any data archived on that system will no longer be available. Restoring a deleted system requires intervention from your tenant admin. Once deleted, the system id cannot be reused at a later time. Use this operation with care.
If you simply wish to remove a system from service, you can update the system status or available attributes depending on whether you want to disable user or visibility.
Multi-user environments¶
If your application supports a multi-user environment and those users do not have API accounts, then you may run into a situation where you are juggling multiple user credentials for a single system. Tapis has a solution for this problem in the for of its Internal User feature. You can map your application users into a private user store Tapis provides you and assign those users credentials on your systems. This allows you to move seamlessly from community users to private users and back without having to alter your application code. For a deep discussion on the mechanics and implications of credential management with internal users, see the Internal User Credential Management guide.
System roles¶
Systems you register are private to you and you alone. You can, however, allow other Tapis clients to utilize the system you define by granting them a role on the system using the systems roles services. The available roles are given in the table below.
Role | Description |
---|---|
GUEST | Gives any authenticated user readonly access to the system. No file operations or job executions are allowed for users with GUEST access. |
USER | Gives a user the ability to run jobs and access data on the system. |
PUBLISHER | All the rights of USER as well as the ability to publish applications listing the system as an execution host. |
ADMIN | All the rights of PUBLISHER as well as the ability to edit and grant roles on the system details. Admins may use the system to access data and run jobs using the default credential assigned to the system, but they may not view or update any of the credentials stored by the system owner. It is not possible for anyone but the system owner to assign or leverage internal user credentials on a system. |
OWNER | Reserved for the user that originally created the system. This role is non-revokable. |
Please see the Systems Roles tutorial for a deep discussion of system roles and how they are used.
System scope¶
Throughout these tutorials and Beginner's Guides, we have referred to both public and private systems. In addition to roles, systems have a concept of scope associated with them. Not to be confused with OAuth scope mentioned in the Authentication Guide, system scope refers to the availability of a system to the general user community. The following table lists the available scopes and their meanings.
Scope | Required role | Description |
---|---|---|
private | Admin | System is visible and available for use to the owner and to anyone whom they grant a role. |
read only | Tenant admin | Storage system is visible and available for data browsing and download by any API user. Write access is restricted unless explicitly granted to a specific user. |
public | Tenant admin | System is visible and available to all users for reading and writing. Virtual user home directories are enforced and write access outside of a user's home directory is restricted unless explicitly granted by a system admin. |
Private systems¶
All systems are private by default. This means that no one can use a system you register without you or another user with “admin” permissions granting them a role on that system. Most of the time, unless you are configuring a tenant for your organization, all the systems you register will stay private. Do not mistake the term private for isolated. Private simply means not public. Another way to think of private systems is as “invitation only.” You are free to share your system as many or as few people as you want and it will still remain a private system.
Public systems¶
Public systems are available for use by every API user within your tenant. Once public, systems inherit specific behavior unique to their type
. We will cover each system type in turn.
Public Storage Systems¶
Public storage systems enforce a virtual user home directory with implied user permissions. The following table gives a brief summary of the permission implications. You can read more about chan in the Data Permissions tutorial.
rootDir |
homeDir |
URL path | User permission |
---|---|---|---|
/ | /home | — | READ |
/ | /home | / | READ |
/ | /home | /var | READ |
/ | /home | systest | ALL |
/ | /home | systest/some/subdir | ALL |
/ | /home | rjohnson | NONE |
Notice in the above example that on public systems, users will have implied ownership of a folder matching their username in the system’s homeDir
. In the table, this means that user “systest” will have ownership of the physical home directory /home/systest
on the system after it’s public. It is important that, before publishing a system, you make sure that the account used to access the system can actually write to these folders. Otherwise, users will not be able to access their data on the system you make public.
Before making a system public, make sure that you have a strategy for mapping API users to directories on the system you want to expose. If mapping to the /home
folder on a Unix system, make sure the account used to access the system has write access to all user directories.
Public Execution Systems¶
Public execution systems do not share the same behavior as public storage systems. Unless explicit permission has been given, public execution systems are not accessible for data access by non-privileged users. This is because public systems allow all users to run applications on them and granting public access to the file system would expose user job data to all users. If you do need to expose the data on a public execution system, either register it again as a storage system (using an appropriate rootDir
outside of the system scratchDir
and workDir
paths), or grant specific users a role on the system.
Publishing a system¶
To publish a system and make it public, you make a PUT request on the system’s url.
tapis systems publish -v $SYSTEM_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"publish"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response from the service will be the same system description we saw before, this time with the public attribute set to true.
Unpublishing a system¶
tapis systems unpublish -v $SYSTEM_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"unpublish"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response from the service will be the same system description we saw before, this time with the public attribute set to false.
To unpublish a system, make the same request with the action
attribute set to unpublish.
Default systems¶
As you continue to use Tapis over time, it will not be uncommon for you to accumulate additional storage and execution systems through both self-registration and other people sharing their systems with you. It may even be the case that you have multiple public systems available to you. In this situation, it is helpful for both you and your users to specify what the default systems should be.
Default systems are the systems that are used when the user does not specify a system to use when performing a remote action in Tapis. For example, specifying an archivePath
in a job request, but no archiveSystem
, or specifying a deploymentPath
in an app description, but no deploymentSystem
. In these situations, Tapis will use the user’s default storage system.
Four types of default systems are possible. The following table describes them.
Type | Scope | Role needed to set | Description |
---|---|---|---|
storage | user default | USER | Default storage system for an individual user. This takes priority over any global defaults and will be used in all data operations in leu of a system being specified for this user. |
storage | global default | Tenant admin | Default storage system for an entire tenant. This will be used as the default storage system whenever a user has not explicitly specified another. Only public systems may be made the global default. |
execution | user default | USER | Default execution system for an individual user. This takes priority over any global defaults and will be used in all app and job operations in leu of an execution system being specified for this user. In the case of app registration, normal user role requirements apply. |
execution | global default | Tenant admin | Default execution system for an entire tenant. This will be used as the default execution system whenever a user has not explicitly specified another. Only public systems may be made the global default. |
As a best practice, it is recommended to always specify the system you intend to use when interacting with Tapis. This will eliminate ambiguity in each request and make your actions more repeatable over time as the availability and configuration of the global and user default systems may change.
Setting user default system¶
To set a system as the user’s default, you make a PUT request on the system’s url. Only systems the user has access to may be used as their default.
tapis systems default set $SYSTEM_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"setDefault"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response from the service will be the same system description we saw before, this time with the default
attribute set to true.
Unsetting user default system¶
tapis systems default unset $SYSTEM_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"unsetDefault"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response from the service will be the same system description we saw before, this time with the default
attribute set to false.
To remove a system as the user’s default, make the same request with the action
attribute set to unsetDefault. Keep in mind that you cannot remove the global default system from being the user’s default. You can only set a different one to replace it.
Setting global default system¶
Tenant administrators may wish to set default storage and execution systems for an entire tenant. These are called global default systems. There may be at most one system of each type set as a global default. To set a global default system, first make sure that the system is public. Only public systems may be set as a global default. Next, make sure you have administrator permissions for your tenant. Only tenant admins may publish systems and manage the global defaults. Lastly, make a PUT request on the system’s url with an action
attribute in the body set to unsetGlobalDefault.
tapis systems default set -G $SYSTEM_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"setGlobalDefault"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
The response from the service will be the same system description we saw before, this time with both the default
and public
attributes set to true.
Setting global default systems does not preclude users from manually setting their own default systems. Any user-defined default systems will trump the global default system setting for that user.
To remove a system from being the global default, make the same request with the action
attribute set to unsetGlobalDefault.
tapis systems default unset -G $SYSTEM_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"unsetGlobalDefault"}'
https://api.tacc.utexas.edu/systems/v2/$SYSTEM_ID
This time the response from the service will have default
set to false and public
set to true.
Files¶
The Tapis Files service allows you to manage data across multiple storage systems using multiple protocols. It supports traditional file operations such as directory listing, renaming, copying, deleting, and upload/download that are traditional to most file services. It also supports file importing from arbitrary locations, metadata assignment, and a full access control layer allowing you to keep your data private, share it with your colleagues, or make it publicly available.
Files service URL structure¶
Canonical URL for all file items accessible in the Platform
https://api.tacc.utexas.edu/files/v2/media/system/$SYSTEM_ID/$PATH
Every file and directory referenced through the Files service has a canonical URL show in the first example. The following table defines each component:
Token | Description |
---|---|
$SYSTEM_ID | The id of the system where the file or directory lives. The correspond to the ids returned from the Systems service. |
$PATH | (Optional:) The path on the remote system. By default, all paths are relative to the home directory defined in the system description. To specify an absolute path, prefix the path with a `/`. For more on path resolution, see the next section. |
Tapis also supports the concept of default systems. Excluding the /system/$SYSTEM_ID
segments from the above URL, the Files service will automatically assume you are referencing your default storage system. Thus, if your default system was api.tacc.cloud
, the following two examples would be identical.
If api.tacc.cloud
is your default storage system then
https://api.tacc.utexas.edu/files/v2/media/shared
is equivalent to this:
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/shared
This comes in especially handy when referencing your default system paths in other contexts such as job requests and when interacting with the Tapis CLI. A good example of this situation is when you have a global default storage system accessible to all your users. In this case, most users will use that for all of their data staging and archiving needs. These users may find it easier not to even think about the system they are using. The default system support in the Files service allows them to do just that.
When building applications against the Files service, it is considered a best practice to always specify the intended system ID when constructing URL paths to avoid situations where users change their default systems. This will also provide long-term stability to your data references and make debugging much easier. You can read more about default systems in the Systems Guide.
Understanding file paths¶
One powerful, but potentially confusing feature of Tapis is its support for virtualizing systems paths. Every registered system specifies both a root directory, rootDir
, and a home directory, homeDir
attribute in its storage configuration. rootDir
tells Tapis the absolute path on the remote system that it should treat as /
. Similar to the Linux chroot
command; no requests made to Tapis will ever be resolved to locations outside of rootDir
.
Type of storage system | Examples of rootDir values |
---|---|
Linux |
|
Cloud |
|
iRODS |
|
homeDir
specifies the path, relative to rootDir
, that Tapis should use for relative paths. Since Tapis is stateless, there is no concept of a current working directory. Thus, when you specify a path to Tapis that does not begin with a /
, Tapis will always prefix the path with the value of homeDir
. The following table gives several examples of how different combinations of rootDir
, homeDir
, and URL paths will be resolved by Tapis.
"rootDir" value | "homeDir" value | Tapis URL path | Resolved path on system |
---|---|---|---|
/ | / | -- | / |
/ | / | .. | / |
/ | / | home | /home |
/ | / | /home | /home |
/ | /home/nryan | -- | /home/nryan |
/ | /home/nryan | / | / |
/ | /home/nryan | .. | /home |
/ | /home/nryan | nryan | /home/nryan/nryan |
/ | /home/nryan | /nryan | /nryan |
/home/nryan | / | -- | /home/nryan |
/home/nryan | / | .. | /home/nryan |
/home/nryan | /home | / | /home/nryan |
/home/nryan | /home | .. | /home/nryan |
/home/nryan | /home | home | /home/nryan/home/home |
/home/nryan | /home | /bgibson | /home/nryan/bgibson |
Transferring data¶
Before we talk about how to do basic operations on your data, let’s first talk about how you can move your data around. You already have a storage system available to you, so we will start with the “hello world” of data movement, uploading a file.
Uploading data¶
Uploading a file
tapis files upload agave://tacc.work.taccuser files/picksumipsum.txt
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X POST \
-F "fileToUpload=@files/picksumipsum.txt" \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan
The response will look something like this:
{
"internalUsername": null,
"lastModified": "2014-09-03T10:28:09.943-05:00",
"name": "picksumipsum.txt",
"nativeFormat": "raw",
"owner": "nryan",
"path": "/home/nryan/picksumipsum.txt",
"source": "http://127.0.0.1/picksumipsum.txt",
"status": "STAGING_QUEUED",
"systemId": "api.tacc.cloud",
"uuid": "0001409758089943-5056a550b8-0001-002",
"_links": {
"history": {
"href": "https://api.tacc.utexas.edu/files/v2/history/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"system": {
"href": "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
}
}
}
You may upload data to a remote systems by performing a multipart POST on the FILES service. If you are using the Tapis CLI, you can perform recursive directory uploads. If you are manually calling curl or building an app with the Tapis SDK, you will need to implement the recursion yourself. You can take a look in the files-upload
script to see how this is done. The following is an example of how to upload a file that we will use in the remainder of this tutorial.
You will see a progress bar while the file uploads, followed by a response from the server with a description of the uploaded file. Tapis does not block during data movement operations, so it may be just a moment before the file physically shows up on the remote system.
Importing data¶
You can also have Tapis download data from an external URL. Rather than making a multipart file upload request, you can pass in a JSON object with the URL and an optional target file name, type, and array of notifications subscriptions. Tapis supports several protocols for ingestion listed in the next table.
Schema | Details |
---|---|
http | Supported with and without user info |
https | Supported with and without user info |
ftp | Anonymous FTP only |
sftp | User info required in URL |
agave | No user info supported. |
To demonstrate how this works, we will import a README.md file from the Tapis Samples git repository in Bitbucket.
Download a file from a web accessible URL
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
-- data '{ "url":"https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md"}'
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan
The response will look something like this:
{
"name" : "README.md",
"uuid" : "0001409758713912-5056a550b8-0001-002",
"owner" : "nryan",
"internalUsername" : null,
"lastModified" : "2014-09-10T20:00:55.266-05:00",
"source" : "https://bitbucket.org/agaveapi/science-api-samples/raw/master/README.md",
"path" : "/home/nryan/README.md",
"status" : "STAGING_QUEUED",
"systemId" : "api.tacc.cloud",
"nativeFormat" : "raw",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/README.md"
},
"system" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
},
"history" : {
"href" : "https://api.tacc.utexas.edu/files/v2/history/system/api.tacc.cloud/nryan/README.md"
}
}
}
Downloading data from a third party is done offline as an asynchronous activity, so the response from the server will come right away. One thing worth noting is that the file length given in the response will always be -1. This is because, generally speaking, Tapis does not know what the actual source file size is until after the repsonse is send back. The file size will be updated as the download progresses. You can track the progress by querying the destination file item’s history. An entry will be present showing the progress of the download.
For this exercise, the file we just downloaded is just a few KB, so you should see it appear in your home folder on api.tacc.cloud
almost immediately. If you were importing larger datasets, the transfer could take significantly longer depending on the network quality between Tapis and the source location. In this case, you would see the file size continue to increase until it completed. In the event of a failed transfer, Tapis will retry several times before canceling the transfer.
Tapis attempts to make smart decisions about how and when to transfer data. This includes leveraging third-party transfers whenever possible, scaling directory copies out horizontally, and taking advantage of chunked or parallel uploads. As a result, data may arrive in a non-deterministic way on the target system. This is normal and should be expected.
Transferring data¶
Transferring data between systems
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data-binary '{"url":"agave://stampede.tacc.utexas.edu//etc/motd"}' \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan
The response from the service will be the same as the one we received importing a file.
Much like downloading data, Tapis can manage the transfer of data between registered systems. This is, in fact, how data is staged prior to running a simulation. Data transfers are carried out asynchronously, so you can simply start a transfer and go about your business. Tapis will ensure it completes. If you would like a notification when the transfer completes or reaches a certain stage, you can subscribe for one or more emails, webhooks, and/or realtime notifications, and Tapis will alert them when as the transfer progresses. The following table lists the available file events. For more information about the events and notifications systems, please see the Notifications Guide and Event Reference.
In the example below, we will transfer a file from stampede.tacc.utexas.edu
to api.tacc.cloud
. While the request looks pretty basic, there is a lot going on behind the scenes. Tapis will authenticate to both systems, check permissions, stream data out of Stampede using GridFTP and proxy it into api.tacc.cloud
using the SFTP protocol, adjusting the transfer buffer size along the way to optimize throughput. Doing this by hand is both painful and error prone. Doing it with Tapis is nearly identical to copying a file from one directory to another on your local system.
One of the benefits of the Files service is that it frees you up to work in parallel and scale with your application demands. In the next example we will use the Files service to create redundant archives of a shared project directory.
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data-binary '{"url":"agave://api.tacc.cloud/nryan/foo_project"}' \
https://api.tacc.utexas.edu/files/v2/media/system/nryan.storage1/
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data-binary '{"url":"agave://api.tacc.cloud/nryan/foo_project"}' \
https://api.tacc.utexas.edu/files/v2/media/system/nryan.storage2/
Notice in the above examples that the Files services works identically regardless of whether the source is a file or directory. If the source is a file, it will copy the file. If the source is a directory, it will recursively process the contents until everything has been copied.
Basic data operations¶
Now that we understand how to move data into, out of, and between systems, we will look at how to perform file operations on the data. Again, remember that the Files service gives you a common REST interface to all your storage and execution systems regardless of the authentication mechanism or protocol they use. The examples below will use your default public storage system, but they would work identically with any storage system you have access to.
Directory listing¶
Listing a file or directory
tapis files list -v agave://tacc.work.taccuser/
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/files/v2/listings/system/api.tacc.cloud/nryan
The response would look something like this:
[
{
"format": "folder",
"lastModified": "2012-08-03T06:30:12.000-05:00",
"length": 0,
"mimeType": "text/directory",
"name": ".",
"path": "nryan",
"permisssions": "ALL",
"system": "api.tacc.cloud",
"type": "dir",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan"
},
"system": {
"href": "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
}
}
},
{
"format": "raw",
"lastModified": "2014-09-10T19:47:44.000-05:00",
"length": 3235,
"mimeType": "text/plain",
"name": "picksumipsum.txt",
"path": "nryan/picksumipsum.txt",
"permissions": "ALL",
"system": "api.tacc.cloud",
"type": "file",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"system": {
"href": "https://api.tacc.utexas.edu/systems/v2/api.tacc.cloud"
}
}
}
]
Obtaining a directory listing, or information about a specific file is done by making a GET request on the /files/v2/listings/
resource.
The response to this contains a summary listing of the contents of your home directory on api.tacc.cloud
. Appending a file path to your commands above would give information on a specific file.
Move, copy, rename, delete¶
Basic file operations are available by sending a POST request the the /files/v2/media/
collection with the following parameters.
Attribute | Description |
---|---|
action | The action you want to perform. Select one of "move", "copy", "rename", "mkdir". |
path | Full path to the destination file or folder. This may be the name of a new directory or renamed file, or an absolute or relative Tapis path where the file or directory should be copied/moved. |
Copying files and directories¶
Copy a file item within the same system.
tapis files copy AGAVE_URI DESTINATION
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X PUT \
--data-binary '{"action":"copy","path":"$DESTPATH"}' \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH
The response from a copy operation will be a JSON object describing the new file or folder.
Copying can be performed on any remote system. Unlike the Unix cp
command, all copy invocations in Tapis will overwrite the destination target if it exists. In the event of a directory collision, the contents of the two directory trees will be merged with the source overwriting the destination. Any overwritten files will maintain their provenance records and have an additional entry added to record the copy operation.
Moving files and directories¶
tapis files move AGAVE_URI DESTINATION
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X PUT \
--data-binary '{"action":"move","path":"$DESTPATH"}' \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH
Moving can be performed on any remote system. Moving a file or directory will overwrite the destination target if it exists. Unlike copy operations, the destination will be completely replaced by the source in the event of a collision. No merge will take place. Further, the provenance of the source will replace that of the target.
Renaming files and directories¶
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X PUT \
--data-binary '{"action":"rename","path":"$NEWNAME"}' \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH
Renaming, like copying and moving, is only applicable within the context of a single system. Unlike on Unix systems, renaming and moving are not synonymous. When specifying a new name for a file or directory, the new name is relative to the parent directory of the original file or directory. Also, If a file or directory already exists with that name, the operation will fail and an error message will be returned. All provenance information will follow the renamed file or directory.
Creating a new directory¶
tapis files mkdir AGAVE_URI DIRECTORY
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X PUT \
--data-binary '{"action":"mkdir","path":"$NEWDIR"}' \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH
Creating a new directory is a recursive action in Tapis. If the parent directories do not exist, they will be created on the fly. If a file or directory already exists with that name, the operation will fail and an error message will be returned.
Deleting a file item¶
tapis files delete AGAVE_URI
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X DELETE \
https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/$PATH
A standard Tapis response with an empty result value will be returned. As with creating a directory, deleting a file or directory is a recursive action in Tapis. No prompt or warning will be given once the request is sent. It is up to you to implement such checks in your application logic and/or user interface.
File history¶
A full history of changes, permissions changes, and access events made through the Files API is recorded for every file and folder on registered Tapis systems. The recorded history events represent a subset of the events thrown by the Files API. Generally speaking, the events saved in a file item’s history represent mutations on the physical file item or its metadata.
Direct vs indirect events¶
Tapis will record both direct and indirect events made on a file item. Examples of direct events are transferring a directory from one system to another or renaming a file. Examples of indirect events are a user manually deleting a file from the command line. The table below contains a list of all the provenance actions recorded.
Event | Description |
---|---|
CREATED | File or directory was created |
DELETED | The file was deleted |
RENAME | The file was renamed |
MOVED | The file was moved to another path |
OVERWRITTEN | The file was overwritten |
PERMISSION_GRANT | A user permission was added |
PERMISSION_REVOKE | A user permission was deleted |
STAGING_QUEUED | File/folder queued for staging |
STAGING | File or directory is currently in flight |
STAGING_FAILED | Staging failed |
STAGING_COMPLETED | Staging completed successfully |
PREPROCESSING | Prepairing file for processing |
TRANSFORMING_QUEUED | File/folder queued for transform |
TRANSFORMING | Transforming file/folder |
TRANSFORMING_FAILED | Transform failed |
TRANSFORMING_COMPLETED | Transform completed successfully |
UPLOADED | New content was uploaded to the file. |
CONTENT_CHANGED | Content changed within this file/folder. If a folder, this event will be thrown whenever content changes in any file within this folder at most one level deep. |
Out of band file system changes¶
Tapis does not own the storage and execution systems you access through the Science APIs, so it cannot guarantee that everything that every possible change made to the file system is recorded. Thus, Tapis takes a best-effort approach to provenance allowing you to choose, through your own use of best practices, how thorough you want the provenance trail of your data to be.
Listing file history¶
List the history of a file item
tapis files history -v agave:://tacc.work.taccuser/nryan/picksumipsum.txt
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/files/v2/history/nryan/picksumipsum.txt
The response to this contains a summary listing all permissions on the
[
{
"status": "DOWNLOAD",
"created": "2016-09-20T19:47:56.000-05:00",
"createdBy": "public",
"description": "File was downloaded"
},
{
"status": "STAGING_QUEUED",
"created": "2016-09-20T19:48:12.000-05:00",
"createdBy": "nryan",
"description": "File/folder queued for staging"
},
{
"status": "STAGING_COMPLETED",
"created": "2016-09-20T19:48:16.000-05:00",
"createdBy": "nryan",
"description": "Staging completed successfully"
},
{
"status": "TRANSFORMING_COMPLETED",
"created": "2016-09-20T19:48:17.000-05:00",
"createdBy": "nryan",
"description": "Your scheduled transfer of http://129.114.97.92/picksumipsum.txt completed staging. You can access the raw file on iPlant Data Store at /home/nryan/picksumipsum.txt or via the API at https://api.tacc.utexas.edu/files/v2/media/system/data.agaveapi.co//nryan/picksumipsum.txt."
}
]
Basic paginated listing of file item history events is available as shown in the example. Currently, the file history service is readonly. The only way to erase the history on a file item is to delete the file item through the API.
File metadata management¶
In many systems, the concept of metadata is directly tied to the notion of a file system. Tapis takes a broader view of metadata and supports it as its own first class resource in the REST API. For more information on how to leverage metadata in Tapis, please consult the Metadata Guide. In there we cover all aspects of how to manage, search, validate, and associate metadata across your entire digital lab.
File permissions¶
Tapis has a fine-grained permission model supporting use cases from creating and exposing readonly storage systems to sharing individual files and folders with one or more users. The permissions available for files items are listed in the following table. Please note that a user must have WRITE permissions to grant or revoke permissions on a file item.
Name | Description |
---|---|
READ | User can view, but not edit or execute the resource |
WRITE | User can edit, but not view or execute the resource |
EXECUTE | User can execute, but not view or edit the resource |
READ_WRITE | User can view and write the resource, but not execute |
READ_EXECUTE | User can view and execute the resource, but not edit it |
WRITE_EXECUTE | User can edit and execute the resource, but not view it |
ALL | User has full control over the resource |
NONE | User has all permissions revoked on the given resource |
Listing all permissions¶
List the permissions on a file item
tapis files pems list agave://tacc.work.taccuser/test_folder/picksumipsum.txt
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
'https://tacc.cloud/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?pretty=true''
The response will look something like the following:
[
{
"username": "nryan",
"internalUsername": null,
"permission": {
"read": true,
"write": true,
"execute": true
},
"recursive": true,
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=nryan"
},
"file": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
}
}
}
]
To list all permissions for a file item, make a GET request on the file item’s permission collection
List permissions for a specific user¶
List the permissions on a file item for a given user
tapis files pems show agave://tacc.work.taccuser rclemens
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username=rclemens
The response will look something like the following:
{
"username":"rclemens",
"permission":{
"read":true,
"write":true
},
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username=rclemens"
},
"parent":{
"href":"https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"profile":{
"href":"https://api.tacc.utexas.edu/profiles/v2/rclemens"
}
}
}
Checking permissions for a single user is done using Tapis URL query search syntax.
Grant permissions¶
Grant read access to a file item
tapis files pems grant agave://tacc.work.taccuser rclemens READ
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username":"rclemens", "permission":"READ"}' \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt
Grant read and write access to a file item
tapis files pems grant agave://tacc.work.taccuser rclemens READ_WRITE
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","rclemens", "permission":"READ_WRITE"}' \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt
The response will look something like the following
[
{
"username": "rclemens",
"internalUsername": null,
"permission": {
"read": true,
"write": true,
"execute": false
},
"recursive": false,
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=rclemens"
},
"file": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/rclemens"
}
}
}
]
To grant another user read access to your metadata item, assign them READ
permission. To enable another user to update a file item, grant them READ_WRITE
or ALL
access.
Delete single user permissions¶
Delete permission for single user on a file item
tapis files pems revoke agave://tacc.work.taccuser rclemens
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","rclemens", "permission":"NONE"}' \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt
A response similiar to the following will be returned
[
{
"username": "rclemens",
"internalUsername": null,
"permission": {
"read": false,
"write": false,
"execute": false
},
"recursive": false,
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=rclemens"
},
"file": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/rclemens"
}
}
}
]
Permissions may be deleted for a single user by making a DELETE request on the metadata user permission resource. This will immediately revoke all permissions to the file item for that user.
Please note that ownership cannot be revoked or reassigned. The user who created the metadata item will always have ownership of that item.
Deleting all permissions¶
Delete all permissions on a file item
tapis files pems drop agave://tacc.work.taccuser
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","*", "permission":"NONE"}' \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X DELETE \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt
An empty response will be returned from the service. Permissions may be cleared for all users on a file item by making a DELETE request on the file item permission collection.
The above operation will delete all permissions for a file item, such that only the owner will be able to access it. Use with care.
Recursive operations¶
Recursively delete all permissions on a directory
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","*", "permission":"READ_WRITE", "recursive": true}' \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/directory/
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X DELETE \
https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?recursive=true
An empty response will be returned from the service on delete. Update will return something like the following.
[
{
"username": "nryan",
"internalUsername": null,
"permission": {
"read": true,
"write": true,
"execute": true
},
"recursive": true,
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/api.tacc.cloud/nryan/picksumipsum.txt?username.eq=nryan"
},
"file": {
"href": "https://api.tacc.utexas.edu/files/v2/media/system/api.tacc.cloud/nryan/picksumipsum.txt"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
}
}
}
]
When dealing with directories, the permission operations you perform will apply onto to the directory item itself. Permissions will not automatically propagate to the directory contents. In cases where you want to recursively apply permissions to the entire directory tree, you can do so by including the recursive
attribute in your permission objects or to your URL query parameters when making a DELETE request.
Publishing data¶
Tapis provides multiple ways to share your data with your colleagues and the general public. In addition to the standard permission model enabling you to share your data with one or more authenticated users within the Platform, you also have the ability to publish your data and make it available via an unauthenticated public URL. Unlike traditional web and cloud hosting, your data remains in its original location and is served in situ by Tapis upon user request.
Publishing a file for folder is simply a matter of granting the special public
user READ
permission on a file or folder. Similar to the way listings and permissions are exposed through unique paths in the Files API, published data is served from a custom /files/v2/download
path. The public data URLs have the following structure:
https://api.tacc.utexas.edu/files/v2/download/<username>/system/<system_id>/<path>
Notice two things. First, a username is inserted after the download path element. This is needed because there is no authorized user for whom to validate system or file ownership on a public request. The username gives the context by which to verify the availability of the system and file item being requested. Second, the system_id
is mandatory in public data requests. This ensures that the public URL remains the same even when the default storage system of the user who published it changes.
The following sections give examples of publishing files and folders in the Tapis Platform.
See the PostIts Guide for other ways to securely share your data with others.
Publishing individual files¶
Publish file item on your default storage system for public access
tapis files pems grant agave://tacc.work.taccuser/nryan/picksumipsum.txt public READ
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","public", "permission":"READ"}' \
https://api.tacc.utexas.edu/files/v2/pems/nryan/picksumipsum.txt
Publish file item on a named system for public access
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","public", "permission":"READ"}' \
https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/picksumipsum.txt
The response will look something like the following:
{
"username": "public",
"permission": {
"read": true,
"write": false,
"execute": false
},
"recursive": false,
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/picksumipsum.txt?username.eq=public"
},
"file": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/public"
}
}
}
Publishing a file for folder is simply a matter of giving the special public
user READ
permission on the file. Once published, the file will be available at the following URL:
https://api.tacc.utexas.edu/files/v2/download/nryan/system/data.iplantcollaborative.org/nryan/picksumipsum.txt
Publishing directories¶
Publish directory on your default storage system for public access
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","public", "permission":"READ", "recursive": true}' \
https://api.tacc.utexas.edu/files/v2/pems/nryan/public
Publish directory on a named system for public access
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST \
--data '{"username","public", "permission":"READ", "recursive": true}' \
https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/public
The response will look something like the following:
{
"username": "public",
"permission": {
"read": true,
"write": false,
"execute": false
},
"recursive": true,
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/public?username.eq=public"
},
"file": {
"href": "https://api.tacc.utexas.edu/files/v2/pems/system/data.iplantcollaborative.org/nryan/public"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/public"
}
}
}
Publishing an entire directory is identical to publishing a single file item. To make all the contents of the directory public as well, include a recursive
field to your request with a value of true
. Once published, the directory and all its contents will be avaialble for download. The above example will make every file and folder in the “nryan/public” directory of “data.iplantcollaborative.org” available for download at the following URL:
https://papi.tacc.utexas.edu/files/v2/download/nryan/system/data.iplantcollaborative.org/nryan/public
Remember that whenever you publish a folder, anything you put in that folder becomes publicly available. As with any cloud storage service, think before blindly copying data into your cloud storage. If you want to restrict the duration or frequency which your public data is accessed, you should see the PostIts Guide for other ways to securely share your data with others.
Publishing considerations¶
Publishing data through Tapis can be a great way to share and access data. There are situations in which it may not be an ideal choice. We list several of the pitfalls user run into when publishing their data.
Large file publishing¶
Before publishing your large datasets, take a step back and consider how you might leverage the Files or Transfers API to reliable serve up your data. HTTP is not the fastest way to serve up the data, and it may not be the best usage pattern for applications hoping to consume it. Thinking through your use case is well worth the time, even if publishing ends up being the best approach.
Static website hosting¶
Website hosting is a fairly common use case for data publishing. The challenge is that your assets are still hosted remotely from our API servers and fetched on demand. This can create some heavy latency when serving up lots of assets. Depending on the nature of your backend storage solution, it may not easily handle access patterns common to the web. In those situations, you may see some files fail to load from time to time. If your site has many files, even a small failure rate can keep your site from reliably loading.
If you are going to use the file publishing service for web hosting, the following tips can help improve your overall experience.
- Whenever possible, reference versions of your css, fonts, and javascript dependencies hosted on public CDN. CloudFlare, Google, and Amazon all host public mirrors of the most popular javascript libraries and frameworks. Linking to those can greatly speed up your load time.
- Use a technology like
Webpack
to reduce the number of files needed to serve your application. - Lazy load your assets with
oclazyload
,requirejs
or includingasync
attributes on your<script>
elements. - Store your assets on a storage system with as little connection and protocol overhead as possible. That means avoiding tape archives, gridftp, overprovisioned shared resources, and systems only accessible through a proxied connection. While the service will still work in all of these situations, it is common for the overhead involved in establishing a connection and authenticating to take longer than the actual file transfer when the file is small. Simply avoiding slower storage protocols can greating speed up your application’s load time.
Apps¶
An app, in the context of Tapis, is an executable code available for invocation through the Tapis Jobs service on a specific execution system. Put another way, an app is a piece of code that you can run on a specific system. If a single code needs to be run on multiple systems, each combination of app and system needs to be defined as an app.
Apps are language agnostic and may or may not carry with them their own dependencies. (More on bundling your app in a moment.) Any code that can be forked at the command line or submitted to a batch scheduler can be registered as a Tapis app and run through the Jobs service.
The Apps service is the central registry for all Tapis apps. The Apps service provides permissions, validation, archiving, and revision information about each app in addition to the usual discovery capability. The rest of this tutorial explains in detail how to register an app to the Apps service, how to manage and share apps, and what the different application scopes mean.
Inputs and Parameters¶
In this section we take a detailed look at the inputs
and parameters
sections of your app descriptions. Each of these sections takes an array of JSON objects. Each JSON object represents either a data source that needs staging in prior to job execution or a primary value passed into your app as a parameter. In either case, the JSON object only requires an id
by which to reference the object in a job request, and a type
field indicating primary type if the object represents a parameter.
In practice, you will want to add some descriptive information, constraints, and runtime validation checks to reduce the amount of error users can run into when attempting to run your app. The full lists of app input and parameter attributes are provided in their respective sections below. However, before we dive deeper into the next section on app inputs, let’s first get a big picture view of what we are doing when we define our app’s input and parameters.

When a user submits a job request in step 1, they specify the inputs and parameters needed to run that job. Those attributes are defined in your app description. The Jobs service will use your app description to validate the values in the job request and either reject it with a descriptive error message as in step 2, or accept it as in step 4. Once the job request is accepted, the values provided for the inputs and parameters given in the job request are used to replace their corresponding template placeholder values in the wrapper script. For example, the job request assigned a value of foo for the input with id equal to input1. Before submitting the job request to the remote system, the Jobs service will replace all occurrences of ${input1}
in the app wrapper script with foo. The same will happen with param1 and param2. All occurrences of ${param1}
will be replaced with bar and all occurrences of ${param2}
will be replaced with 2, just as specified in the job request.
information_source: Notice that Tapis will not handle variable quoting for you. It is up to you to handle any type casting, escaping, and quoting of template values necessary for your app’s logic.
As we look at how to define inputs and parameters for your app, keep this big picture in mind. The purpose of inputs is to specify data that needs to be staged prior to your job running and to tell your wrapper script about them. The purpose of parameters is to specify variables that need to be passed to your wrapper script. To do this, we only need a simple id by which to reference the values in a job request. The rest of what we will discuss in this tutorial is the mechanism that Tapis provides for you to validate, describe, discover, and restrict application inputs and parameters to provider better user and developer experiences using your app.
Inputs¶
The inputs
attribute of your app description contains a JSON array of input objects. An input represents one or more pieces of data that your app will use at runtime. That data can be a single file, a directory, or a response from a web service. It can reside on a system that Tapis knows about, or at a publicly accessible URL. Regardless of where it lives and what it is, Tapis will grab the data (recursively if need be) and copy it to your job’s working directory just before execution.
information_source: In the Job management tutorial, we talk in detail about the job lifecycle. Here we simply point out that Tapis will handle the staging of your app’s deploymentPath
separately from the staging of your assets. Thus, as a best practice, it is preferable to include all of the assets your app needs to run in yourdeploymentPath
rather than defining them as inputs. This will allow Tapis to make better caching decisions and reduce the overall throughput when running a job.
A minimal input object contains a single inputs.[].id
attribute that uniquely identifies it within the context of your app. Any alphanumeric value under 64 characters can be an identifier, but it must be unique among all the inputs and parameters in that app.
{
"id": "input1"
}
Most of the time, such a minimal definition is not helpful. At the very least, you would want some descriptive information, a restriction on the cardinality, and potentially a default value. This can be achieved with the details
, semantics
, and value
objects. The full list of input attributes is shown in the following table. We cover each attribute in the corresponding section below.
Name | Type | Description |
---|---|---|
id | string | Required: The textual id of this input. This value must be unique within all inputs and inputs for an app description. |
details | JSON object | |
details.argument | string | A command line argument or flag to be prepended before the input value. |
details.description | string | Human-readable description of the input. Often used to create contextual help in automatically generated UI. |
details.label | string | Human-readable label for the input. Often implemented as text label next to the field in automatically generated UI. |
details.showArgument | boolean | Whether to include the argument value for this input when performing the template variable replacement during job submission. If true, the details.argument value will be prepended, without spaces, to the actual input value(s). |
details.repeatArgument | boolean | When multiple values are provided for this input, this attribute determines whether to include the argument value before each user-supplied value when performing the template variable replacement during job submission. The details.showArgument value must be true for this value to be applied. |
semantics | JSON object | Describes the semantic definition of this inputs and the filetypes it represents. Multiple ontologies and values are supported. |
semantics.fileTypes | JSON array | Array of string values describing the file types represented by this input. The types correspond to values from the Transforms service. Use “raw-0” for the time being. |
semantics.minCardinality | integer | Minimum number of values this input must have. |
semantics.maxCardinality | integer | Maximum number of values this input can have. A null value or value of -1 indicates no limit. |
semantics.ontology | JSON array | List of ontology terms (or URIs pointing to ontology terms) applicable to the input. We recommend at least specifying an XSL Schema Simple Type. |
value | JSON object | A description of the anticipated value and the situations when it is required. |
value.default | string, JSON array | The default value for this input. This value is optional except when value.required is true and value.visible is false. Values may be absolute or relative paths on the user’s default storage sytem, a Tapis URI, or any valid URL with a supported schema. |
value.order | integer | The order in which this input should appear when auto-generating a command line invocation. |
value.required | boolean | Required: Is specification of this input mandatory to run a job? |
value.validator | string | Perl-formatted regular expression to restrict valid values. |
value.visible | boolean | When automatically generated a UI, should this field be visible to end users? If false, users will not be able to set this value in their job request. |
value.enquote | boolean | Should the value be surrounded in quotation marks prior to injecting into the wrapper template at job runtime. |
Input details section¶
The inputs.[].details
object contains information specifying how to describe an input in different contexts. The description
and label
values provide human readable information appropriate for a tool tip and form label respectively. Neither of these attributes are required, however they dramatically improve the readability of your app description if you include them.
Often times you will need to translate your input value into actual command line arguments. By default, Tapis will replace all occurrences of your attribute inputs.[].id
in your wrapper script with the value of that attribute in your job description. That means that you are responsible for inserting any command line flags or arguments into the wrapper script yourself. This is a pretty straightforward process, however in situations where an input is optional, the resulting command line could be broken if the user does not specify an input value in their job request. One way to work around this is to add a conditional check to the variable assignment and exclude the command line flag or argument if it does not have a value set. Another is to use the inputs.[].details.argument
attribute.
The inputs.[].details.argument
value describes the command line argument that corresponds to this input, and the inputs.[].details.showArgument
attribute specifies whether the inputs.[].details.argument
value should be injected into the wrapper template in front of the actual runtime value. The following table illustrates the result of these attributes in different scenarios.
argument | showArgument | Input value from job request | Value injected into wrapper template |
---|---|---|---|
true | /etc/motd | /etc/motd | |
-f | true | /etc/motd | -f/etc/motd |
-f (trailing space) | true | /etc/motd | -f /etc/motd |
-f | false | /etc/motd | /etc/motd |
–filename | true | /etc/motd | –filename/etc/motd |
–filename= | true | /etc/motd | –filename=/etc/motd |
–filename | false | /etc/motd | /etc/motd |
Input semantics section¶
The inputs.[].semantics
object contains semantic information about the input. The minCardinality
attribute specifies the minimum number of data sources that can be specified for the input. This attribute is used to validate the value(s) provided for the input in a job request. The ontology
attribute specifies a JSON array of URLs pointing to the ontology definitions of this file type. (We recommend at least specifying an XSL Schema Simple Type{:target=”_blank”}.) Finally, the fileTypes
attribute contains a JSON array of file type strings as specified in the transforms service. (In most situations you will leave the fileTypes attribute null or specify RAW-0 as the single file type in the array.)
Input value section¶
The inputs.[].value
object contains the information needed to validate user-supplied input values in a job request. The validator
attribute accepts a Perl regular expression which will be applied to the input value(s). Any submissions that do not match the validator
expression will be rejected.
information_source: If inputs[].semantics.minCardinality
is greater than 1, multiple values will be accepted for input. These values may be provided in a semicolon delimited list or in a JSON array. The values may be relative paths to the user’s default storage system, or URLs. Whatever value(s) the user provides, the validator will be applied independently to the entire value, not just the name.
The default
attribute allows you to specify a default value for the input. This will be used in lieu of a user-supplied value if the input is required
, but not visible
. All default values must match the validator
expression, if provided.
The required
attribute specifies whether the input must be specified during a job submission.
The visible
attribute takes a boolean value specifying whether the input should be accepted as a user-supplied value in a job request. If false, the value will be ignored at job submission and the default
value will be used instead. Whenever visible
is set to false, required
must be true.
The order
attribute is used to specify the order in which inputs should be listed in the response from the API and in command-line generation. By default, order
is set to zero. Thus, providing a value greater than zero is sufficient to force any single input to be listed last.
Validating inputs¶
The previous section covered different ways you can specify for Tapis to validate and restrict the data inputs to your app. When a user submits an job request, the order in which they are applied is as follows.
- visible
- required
- minCardinality
- maxCardinality
- validator
Once an input passes these tests, Tapis will check that it exists and that the user has permission to access the data. Assuming everything passes, the input is accepted and scheduled for staging.
Parameters¶
The parameters
attribute of your app description contains a JSON array of parameter objects. A parameter represents one or more arguments that your app will use at runtime. Those arguments can be more or less anything you want them to be. If, for some reason, your app handles data staging on its own and you do not want Tapis to move the data on your behalf, but you do need a data reference passed in, you can define it as a parameter rather than an input.
A minimal parameter object contains a single id
attribute that uniquely identifies it within the context of your app and a value.type
attribute specifying the primary type of the parameter. Any alphanumeric value under 64 characters can be an identifier, but it must be unique among all the inputs and parameters in that app. The parameter type is restricted to a handful of primary types listed in the table below.
{
"id": "parameter1",
"value": {
"type": "string"
}
}
In most situations you will want some descriptive information and validation of the user-supplied values for this parameter. As with your app inputs, app parameters have details
, semantics
, and value
objects that allow you to do just that. The full list of parameter attributes is shown in the following table. We cover each attribute in the corresponding section below.
Name | Type | Description |
---|---|---|
id | string | Required: The textual id of this parameter. This value must be unique within all parameters and parameters for an app description. |
details | JSON object | |
details.argument | string | A command line argument or flag to be prepended before the parameter value. |
details.description | string | Human-readable description of the parameter. Often used to create contextual help in automatically generated UI. |
details.label | string | Human-readable label for the parameter. Often implemented as text label next to the field in automatically generated UI. |
details.showArgument | boolean | Whether to include the argument value for this parameter when performing the template variable replacement during job submission. If true, the details.argument value will be prepended, without spaces, to the actual parameter value(s). |
details.repeatArgument | boolean | When multiple values are provided for this input, this attribute determines whether to include the argument value before each user-supplied value when performing the template variable replacement during job submission. The details.showArgument value must be true for this value to be applied. |
semantics | JSON object | Describes the semantic definition of this parameters and the filetypes it represents. Multiple ontologies and values are supported. |
semantics.minCardinality | integer | Minimum number of values this parameter must have. |
semantics.maxCardinality | integer | Maximum number of values this parameter can have. A null value or value of -1 indicates no limit. |
semantics.ontology | JSON array | List of ontology terms (or URIs pointing to ontology terms) applicable to the parameter. We recommend at least specifying an XSL Schema Simple Type. |
value | JSON object | A description of the anticipated value and the situations when it is required. |
value.default | string, JSON array | The default value for this parameter. This value can be left blank except when value.required is true and value.visible is false. If the value.type is of this parameter is enumeration, this value must be one of the specified value.enumValues . If the value.type is of this parameter is bool or flag, then only boolean values are accepted here. |
value.enumValues | JSON array | An array of values specifying the possible values this parameter may have when value.type is enumeration. Both JSON Objects and strings are supported in the array. If a JSON Object is given, the object must be a single value attribute. The key will be the value passed into the wrapper template. The value will be the display value shown when auto-generating the option element in the select box representing this input. |
value.order | integer | The order in which this parameter should appear when auto-generating a command line invocation. |
value.required | boolean | Required: Is specification of this parameter mandatory to run a job? |
value.type | string, number, enumeration, bool, flag | JSON type for this parameter (used to generate and validate UI). |
value.validator | string | Perl-formatted regular expression to restrict valid values. |
value.visible | boolean | When automatically generated a UI, should this field be visible to end users? If false, users will not be able to set this value in their job request. |
value.enquote | boolean | Should the value be surrounded in quotation marks prior to injecting into the wrapper template at job runtime. |
Parameter details section¶
The parameters.[].details
object contains information specifying how to describe a parameter in different contexts and is identical to the inputs.[].details
object.
Parameter semantics section¶
The parameters.[].semantics
object contains semantic information about the parameter. Unlike the inputs.[].semantics
object, it only has a single attribute, ontology
. The ontology
attribute specifies a JSON array of URLs pointing to the ontology definitions of this parameter type. (We recommend at least specifying an XSL Schema Simple Type{:target=”_blank”}.)
Parameter value section¶
The parameters.[].value
object contains the information needed to validate user-supplied parameter values in a job request. The type
attribute defines the primary type of this parameter’s values. The available types are:
- number: any real number.
- string: any JSON-escaped alphanumeric string.
- bool: true or false.
- flag: true or false. Identical to boolean, but only the
argument
value will be inserted into the wrapper template. - enumeration: a JSON array of strings values or JSON objects representing the acceptable values for this parameter. If an array of JSON objects is given, each object should have a single attribute with the key being a desired enumeration value, and the value being a human readable descriptive name for the enumerated value. The value of using objects vs strings is that object values provide a way to create more descriptive user interfaces by customizing both the content and value of a HTML select box’s option elements. An example of both is given below.
[
"red",
"white",
"green",
"black"
]
[
{ "red": "Deep Cherry Red" },
{ "white": "Bright White" },
{ "green": "Black Forest Green" },
{ "black": "Brilliant Black Crystal Pearl" }
]
The validator
attribute accepts a Perl regular expression which will be applied to the input value(s). Any submissions that do not match the validator
expression will be rejected. This attribute is available both to parameters of type number and string. It is not available to bool or flag parameter types, or to enumeration parameters as they require the enumValues
attribute instead.
The default
attribute allows you to specify a default value for the parameter. This will be used in lieu of a user-supplied value if the parameter is required
, but not visible
. All default values must match the appropriate validator
if type
is number or string, or be one of the values in the enumValues
array if type
is enumeration.
The enumValues
attribute is a JSON array of alphanumeric values specifying the acceptable values for this input. This attribute only exists for enumeration parameter types.
The required
attribute specifies whether the parameter must be specified during a job submission.
The visible
attribute takes a boolean value specifying whether the parameter should be accepted as as a user-supplied value in a job requests. If false, the value will be ignored at job submission and the default
value will be used instead. Whenever visible
is set to false, required
must be true.
The order
attribute is used to specify the order in which parameters should be listed in the response from the API and in command-line generation. By default, order
is set to 0. Thus, providing a value greater than zero is sufficient to force any single parameter to be listed last.
Validating parameters¶
The previous section covered different ways you can tell for Tapis to validate and restrict the parameters to your app. When a user submits an job request, the order in which they are applied is as follows.
- visible
- required
- type
- validator / enumValues
Wrapper Templates¶
In order to run your application, you will need to create a wrapper template that calls your executable code. The wrapper template is a simple script that Tapis will filter and execute to start your app. The filtering Tapis applies to your wrapper script is to inject runtime values from a job request into the script to replace the template variables representing the inputs and parameters of your app.
The order in which wrapper templates are processed in HPC and Condor apps is as follows.
environment
variables injected.startupScript
run.- Scheduler directives prepended to the wrapper template.
additionalDirectives
concatenated after the scheduler directives.- Custom
modules
concatenated after the additionalDirectives. inputs
andparameters
template variables replaced with values from the job request.- Blacklist commands, if present, are disabled in the scripts.
- Resulting script is written to the remote job execution folder and executed.
The order in which wrapper templates are processed in CLI apps is as follows.
- Shell environment sourced
environment
variables injectedstartupScript
run- Custom
modules
prepended to the top of the wrapper inputs
andparameters
template variables replaced with values from the job request- Blacklist commands, if present, are disabled in the scripts.
- Resulting script is forked into the background immediately.
Environment¶
Comes from the system definition. Handle in your script if you cannot change the system definition to suit your needs. Ship whatever you need with your app’s assets.
Modules¶
See more about Modules and Lmod. Can be used to customize your environment, locate your application, and improve portability between systems. Tapis does not install or manage the module installation on a particular system, however it does know how to interact with it. Specifying the modules needed to run your app either in your wrapper template or in your system definition can greatly help you during the development process.
Default job macros¶
Tapis provides information about the job, system, and user as predefined macros you can use in your wrapper templates. The full list of runtime job macros are give in the following table.
Variable | Description |
---|---|
AGAVE_JOB_APP_ID | The appId for which the job was requested. |
AGAVE_JOB_ARCHIVE | Binary boolean value indicating whether the current job will be archived after the wrapper template exits. |
AGAVE_JOB_ARCHIVE_SYSTEM | The system to which the job will be archived after the wrapper template exits. |
AGAVE_JOB_ARCHIVE_URL | The fully qualified URL to the archive folder where the job output will be copied if archiving is enabled, or the URL of the output listing |
AGAVE_JOB_ARCHIVE_PATH | The path on the archiveSystem where the job output will be copied if archiving is enabled. |
AGAVE_JOB_BATCH_QUEUE | The batch queue on the AGAVE_JOB_EXECUTION_SYSTEM to which the job was submitted. |
AGAVE_JOB_EXECUTION_SYSTEM | The Tapis execution system id where this job is running. |
AGAVE_JOB_ID | The unique identifier of the job. |
AGAVE_JOB_MEMORY_PER_NODE | The amount of memory per node requested at submit time. |
AGAVE_JOB_NAME | The slugified version of the name of the job. See the section on Special Characters for more information about slugs. |
AGAVE_JOB_NAME_RAW | The name of the job as given at submit time. |
AGAVE_JOB_NODE_COUNT | The number of nodes requested at submit time. |
AGAVE_JOB_OWNER | The username of the job owner. |
AGAVE_JOB_PROCESSORS_PER_NODE | The number of cores requested at submit time. |
AGAVE_JOB_SUBMIT_TIME | The time at which the job was submitted in ISO-8601 format. |
AGAVE_JOB_TENANT | The id of the tenant to which the job was submitted. |
AGAVE_JOB_ARCHIVE_URL | The Tapis url to which the job will be archived after the job completes. |
AGAVE_JOB_CALLBACK_RUNNING | Represents a call back to the API stating the job has started. |
AGAVE_JOB_CALLBACK_CLEANING_UP | Represents a call back to the API stating the job is cleaning up. |
AGAVE_JOB_CALLBACK_ALIVE | Represents a call back to the API stating the job is still alive. This will essentially update the timestamp on the job and add an entry to the job's history record. |
AGAVE_JOB_CALLBACK_NOTIFICATION | Represents a call back to the API telling it to forward a notification to the registered endpoint for that job. If no notification is registered, this will be ignored. |
AGAVE_JOB_CALLBACK_FAILURE | Represents a call back to the API stating the job failed. Use this with caution as it will tell the API the job failed even if it has not yet completed. Upon receiving this callback, Tapis will abandon the job and skip any archiving that may have been requested. Think of this as kill -9 for the job lifecycle. |
Input data¶
Tapis will stage the files and folders you specify as inputs to your app. These will be available in the top level of your job directory at runtime. Additionally, the names of each of the inputs will be injected into your wrapper template for you to use in your application logic. Please be aware that Tapis will not attempt to resolve namespace conflicts between your app inputs. That means that if a job specifies two inputs with the same name, one will overwrite the other during the input staging phase of the job and, though the variable names will be correctly injected to the wrapper script, your job will most likely fail due to missing data.
See the table below for fields that must be defined for an app’s inputs:
Field | Mandatory | Type | Description |
---|---|---|---|
id | X | string | This is the "name" of the file. You will use this in your wrapper script later whenever you need to refer to the BAM file being sorted |
value.default | string | The path, relative to X, of the default value for the input | |
value.order | integer | Ignore for now | |
value.required | X | boolean | Is specification of this input mandatory to run a job? |
value.validator | string | Perl-format regular expression to restrict valid values | |
value.visible | boolean | When automatically generated a UI, should this field be visible to end users? | |
semantics.ontology | array[string] | List of ontology terms (or URIs pointing to ontology terms) applicable to the input format | |
semantics.minCardinality | integer | Minimum number of values accepted for this input | |
semantics.maxCardinality | integer | Maximum number of values accepted for this input | |
semantics.fileTypes | X | array[string] | List of Tapis file types accepted. Always use "raw-0" for the time being |
details.description | string | Human-readable description of the input. Often implemented as contextual help in automatically generated UI | |
details.label | string | Human-readable label for the input. Often implemented as text label next to the field in automatically generated UI | |
details.argument | string | The command-line argument associated with specifying this input at run time | |
details.showArgument | boolean | Include the argument in the substitution done by Tapis when a run script is generated |
Variable injection¶
If you refer back to the app definition we used in the App Management Tutorial, you will see there are multiple inputs and parameters defined for that app. Each input and parameter object had an id
attribute. That id
value is the attribute name you use to associate runtime values with app inputs and parameters. When a job is submitted to Tapis, prior to physically running the wrapper template, all instances of that id
are replaced with the actual value from the job request. The example below shows our app description, a job request, and the resulting wrapper template at run time.
Type declarations¶
During the jobs submission process, Tapis will store your inputs and parameters as serialized JSON. At the point that variable injection occurs, Tapis will replace all occurrences of your input and parameter with their value provided in the job request. In order for Tapis to properly identify your input and parameter ids, wrap them in brackets and prepend a dollar sign. For example, if you have a parameter with id param1
, you would include it in your wrapper script as ${param1}
. Case sensitivity is honored at all times.
Boolean values¶
Boolean values are passed in as truthy values. true = 1, false is empty.
Cardinality¶
Cardinality is not used in resolving wrapper template variables.
Parameter Flags¶
If your parameter was of type “flag”, Tapis will replace all occurences of the template variable with the value you provided for the argument
field.
App packaging¶
Tapis API apps have a generalized structure that allows them to carry dependencies around with them. In the case below, package-name-version.dot.dot</em>
is a folder that you build on your local system, then store in your Tapis Cloud Storage in a designated location (we recommend /home/username/applications/app_folder_name
). It contains binaries, support scripts, test data, etc. all in one package. Tapis basically uses a very rough form of containerized applications (more on this later). We suggest you set your apps up to look something like the following:
Tapis runs a job by first transferring a copy of this directory into temporary directory on the target executionSystem
. Then, the input data files (we’ll show you how to specify those are later) are staged into place automatically. Next, Tapis writes a scheduler submit script (using a template you provide i.e. script.template) and puts it in the queue on the target system. The Tapis service then monitors progress of the job and, assuming it completes, copies all newly-created files to the location specified when the job was submitted. Along the way, critical milestones and metadata are recorded in the job’s history.
Tapis app development proceeds via the following steps:
- Build the application locally on the
executionSystem
- Ensure that you are able to run it directly on the
executionSystem
- Describe the application using a Tapis app description
- Create a shell template for running the app
- Upload the application directory to a
storageSystem
- Post the app description to the Tapis apps service
- Debug your app by running jobs and updating the app until it works as intended
- (Optional) Share the app with some friends to let them test it
Application metadata¶
Field | Mandatory | Type | Description |
---|---|---|---|
checkpointable | X | boolean | Application supports checkpointing |
defaultMemoryPerNode | integer | Default RAM (GB) to request per compute node | |
defaultProcessorsPerNode | integer | Default processor count to request per compute node | |
defaultMaxRunTime | integer | Default maximum run time (hours:minutes:seconds) to request per compute node | |
defaultNodeCount | integer | Default number of compute nodes per job | |
defaultQueue | string | On HPC systems, default batch queue for jobs | |
deploymentPath | X | string | Path relative to homeDir on deploymentSystem where application bundle will reside |
deployementSystem | X | string | The Tapis-registered STORAGE system upon which you have write permissions where the app bundle resides |
executionSystem | X | string | a Tapis-registered EXECUTION system upon which you have execute and app registration permissions where jobs will run |
helpURI | X | string | A URL pointing to help or description for the app you are deploying |
label | X | string | Human-readable title for the app |
longDescription | string | A short paragraph describing the functionality of the app | |
modules | array[string] | Ordered list of modules on systems that use lmod or modules | |
name | X | string | unique, URL-compatible (no special chars or spaces) name for the app |
ontology | X | array[string] | List of ontology terms (or URIs pointing to ontology terms) associated with the app |
parallelism | X | string | Is your application capable of using more than a single compute node? (SERIAL or PARALLEL) |
shortDescription | X | string | Brief description of the app |
storageSystem | X | string | The Tapis-registered STORAGE system upon which you have write permissions. Default source of and destination for data consumed and emitted by the app |
tags | array[string] | List of human-readable tags for the app | |
templatePath | X | string | Path to the shell template file, relative to deploymentPath |
testPath | X | string | Path to the shell test file, relative to deploymentPath |
version | X | string | Preferred format: Major.minor.point integer values for app |
warning: The combination of name and version must be unique the entire iPlant API namespace.
Parameter metadata¶
Field | Mandatory | Type | Description |
---|---|---|---|
id | X | string | This is the "name" of the parameter. At runtime, it will be replaced in your script template based on the value passed as part of the job specification |
value.default | string | If your app has a fixed-name output, specify it here | |
value.order | integer | Ignore for now. Supports automatic generation of command lines. | |
value.required | boolean | Is specification of this parameter mandatory to run a job? | |
value.type | string | JSON type for this parameter (used to generate and validate UI). Valid values: "string", "number", "enumeration", "bool", "flag" | |
value.validator | string | Perl-formatted regular expression to restrict valid values | |
value.visible | boolean | When automatically generated a UI, should this field be visible to end users? | |
semantics.ontology | array[string] | List of ontology terms (or URIs pointing to ontology terms) applicable to the parameter. We recommend at least specifying an XSL Schema Simple Type. | |
details.description | string | Human-readable description of the parameter. Often used to create contextual help in automatically generated UI | |
details.label | string | Human-readable label for the parameter. Often implemented as text label next to the field in automatically generated UI | |
details.argument | string | The command-line argument associated with specifying this parameter at run time | |
details.showArgument | boolean | Include the argument in the substitution done by Tapis when a run script is generated |
Output metadata¶
Field | Mandatory | Type | Description |
---|---|---|---|
id | X | string | This is the "name" of the output. It is not currently used by the wrapper script but may be in the future |
value.default | string | If your app has a fixed-name output, specify it here | |
value.order | integer | Ignore for now | |
value.required | X | boolean | Is specification of this input mandatory to run a job? |
value.validator | string | Perl-format regular expression used to match output files | |
value.visible | boolean | When automatically generated a UI, should this field be visible to end users? | |
semantics.ontology | array[string] | List of ontology terms (or URIs pointing to ontology terms) applicable to the output format | |
semantics.minCardinality | integer | Minimum number of values expected for this output | |
semantics.maxCardinality | integer | Maximum number of values expected for this output | |
semantics.fileTypes | X | array[string] | List of Tapis file types that may apply to the output. Always use "raw-0" for the time being |
details.description | string | Human-readable description of the output | |
details.label | string | Human-readable label for the output | |
details.argument | string | The command-line argument associated with specifying this output at run time (not currently used) | |
details.showArgument | boolean | Include the argument in the substitution done by Tapis when a run script is generated (not currently used) |
information_source: Note: If the app you are working on doesn’t natively produce output with a predictable name, one thing you can do is add extra logic to your script to take the existing output and rename it to something you can control or predict.
Tools and Utilities¶
- Stumped for ontology terms to apply to your Tapis app inputs, outputs, and parameters? You can search EMBL-EBI for ontology terms, and BioPortal can provide links to EDAM.
- Need to validate JSON files? Try JSONlint or JSONparser
Build a samtools application bundle¶
# Log into Stampede
ssh stampede2.tacc.utexas.edu
# Unload system's samtools module if it happens to be loaded by default
module unload samtools
# All TACC systems have a directory than can be accessed as $WORK
cd $WORK
# Set up a project directory
mkdir tacc_prod
mkdir tacc_prod/src
mkdir -p tacc_prod/samtools-0.1.19/stampede2/bin
mkdir -p tacc_prod/samtools-0.1.19/stampede2/test
# Build samtools using the Intel C Compiler
# If you don't have icc, gcc will work but icc usually gives more efficient binaries
cd iPlant/src
wget "http://downloads.sourceforge.net/project/samtools/samtools/0.1.19/samtools-0.1.19.tar.bz2"
tar -jxvf samtools-0.1.19.tar.bz2
cd samtools-0.1.19
make CC=icc CFLAGS='-xCORE-AVX2 -axCORE-AVX512,MIC-AVX512 -O3'
# Copy the samtools binary and support scripts to the project bin directory
cp -R samtools bcftools misc ../../samtools-0.1.19/stampede2/bin/
cd ../../samtools-0.1.19/stampede2
# Test that samtools will launch
bin/samtools
Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.19-44428cd
Usage: samtools <command> [options]
Command: view SAM <-> BAM conversion
sort sort alignment file
mpileup multi-way pileup...
# Package up the bin directory as an compressed archive
# and remove the original. This preserves the execute bit
# and other permissions and consolidates movement of all
# bundled dependencies in bin to a single operation. You
# can adopt a similar approach with lib and include.
tar -czf bin.tgz bin && rm -rf bin
Run samtools sort locally¶
Your first objective is to create a script that you know will run to completion under the Stampede scheduler and environment (or whatever executionSystem you’re working on). It will serve as a model for the template file you create later. In our case, we need to write a script that can be submitted to the Slurm scheduler. The standard is to use Bash for such scripts. You have five main objectives in your script:
- Unpack binaries from bin.tgz
- Extend your PATH to contain bin
- Craft some option-handling logic to accept parameters from Tapis
- Craft a command line invocation of the application you will run
- Clean up when you’re done
First, you will need some test data in your current directory (i.e., $WORK/iPlant/samtools-0.1.19/stampede2/ ). You can use this test file
tapis files download agave://tacc.work.taccusershared/iplantcollaborative/example_data/Samtools_mpileup/ex1.bam
or you can any other BAM file for your testing purposes. Make sure if you use another file to change the filename in your test script accordingly!
Now, author your script. You can paste the following code into a file called test-sort.sh or you can copy it from here.
#!/bin/bash
# Tapis automatically writes these scheduler
# directives when you submit a job but we have to
# do it by hand when writing our test
#SBATCH -p development
#SBATCH -t 00:30:00
#SBATCH -n 16
#SBATCH -A tacc.prod
#SBATCH -J test-samtools
#SBATCH -o test-samtools.o%j
# Set up inputs and parameters
# We're emulating passing these in from Tapis
# inputBam is the name of the file to be sorted
inputBam="ex1.bam"
# outputPrefix is a parameter that establishes
# the prefix for the final sorted file
outputPrefix="sorted"
# Parameter for memory used in sort operation, in bytes
maxMemSort=500000000
# Boolean: Sort by name instead of coordinate
nameSort=0
# Unpack the bin.tgz file containing samtools binaries
# If you are relying entirely on system-supplied binaries
# you don't need this bit
tar -xvf bin.tgz
# Extend PATH to include binaries in bin
# If you need to extend lib, include, etc
# the same approach is applicable
export PATH=$PATH:"$PWD/bin"
# Dynamically construct a command line
# by building an ARGS string then
# adding the command, file specifications, etc
#
# We're doing this in a way familar to Tapis V1 users
# first. Later, we'll illustrate how to make use of
# Tapis V2's new parameter passing functions
#
# Start with empty ARGS...
ARGS=""
# Add -m flag if maxMemSort was specified
# You might want to add a constraint for how large maxMemSort
# can be based on the available memory on your executionSystem
if [ ${maxMemSort} -gt 0 ]; then ARGS="${ARGS} -m $maxMemSort"; fi
# Boolean handler for -named sort
if [ ${nameSort} -eq 1 ]; then ARGS="${ARGS} -n "; fi
# Run the actual program
samtools sort ${ARGS} ${inputBam} ${outputPrefix}
# Now, delete the bin/ directory
rm -rf bin
Submit the job to the queue on Stampede…¶
chmod 700 test-sort.sh
sbatch test-sort.sh
You can monitor your jobs in the queue using
showq -u your_tacc_username
Assuming all goes according to plan, you’ll end up with a sorted BAM called sorted.bam, and your bin directory (but not the bin.tgz file) should be erased. Congratulations, you’re in the home stretch: it’s time to turn the test script into a Tapis app.
Craft a Tapis app description¶
In order for Tapis to know how to run an instance of the application, we need to provide quite a bit of metadata about the application. This includes a unique name and version, the location of the application bundle, the identities of the execution system and destination system for results, whether its an HPC or other kind of job, the default number of processors and memory it needs to run, and of course, all the inputs and parameters for the actual program. It seems a bit over-complicated, but only because you’re comfortable with the command line already. Your goal here is to allow your applications to be portable across systems and present a web-enabled, rationalized interface for your code to consumers.
Rather than have you write a description for “samtools sort” from scratch, let’s systematically dissect an existing file provided with the SDK. Go ahead and copy the file into place and open it in your text editor of choice. If you don’t have the SDK installed, you can download the JSON descriptions here.
cd $WORK/tacc_prod/samtools-0.1.19/stampede2/
wget 'https://github.com/TACC-Cloud/agave-docs/blob/doc_changes/docs/agave/guides/apps/samtools-sort.json'
Open up samtools-sort.json in a text editor or in your web browser and follow along below.
Overview¶
Your file samtools-sort.json is written in JSON, and conforms to a Tapis-specific data model. We will dive into key elements here:
To make this file work for you, you will be, at a minimum, editing:
- Its
executionSystem
to match your private instance of Stampede. - Its
deploymentPath
to match your iPlant applications path - The
name
of the app to something besides “samtools-sort”. We recommend “$your_cyverse_username-samtools-sort”.
Instructions for making these changes will follow.
All Tapis application descriptions have the following structure:
{ "application_metadata":"value",
"inputs":[],
"parameters":[],
"outputs":[]
}
There is a defined list of application metadata fields, some of which are mandatory. Inputs, parameters, and outputs are specified as an array of simple data structures, which are described earlier in the Application metadata section.
Inputs¶
To tell Tapis what files to stage into place before job execution, you need to define the app’s inputs in a JSON array. To implement the SAMtools sort app, you need to tell Tapis that a BAM file is needed to act as the subject of our sort:
{
"id":"inputBam",
"value":{
"default":"",
"order":0,
"required":true,
"validator":"",
"visible":true
},
"semantics":{
"ontology":[
"http://sswapmeet.sswap.info/mime/application/X-bam"
],
"minCardinality":1,
"fileTypes":[
"raw-0"
]
},
"details":{
"description":"",
"label":"The BAM file to sort",
"argument":null,
"showArgument":false
}
}
For information on what these fields mean, see the input metadata table.
information_source: A note on paths: In this CyVerse-oriented tutorial, we assume you will stage data to and from “data.iplantcollaborative.org”, the default storage system for CyVerse users. In this case, you can use relative paths relative to homeDir on that system (i.e. vaughn/analyses/foobar). To add portability, marshal data from other storageSystems, or import from public servers, you can also specify fully qualified URIs as follows:
- storageSystem namespace: agave://storage-system-name/path/to/file
- public URI namespace: https://www.cnn.com/index.html
Parameters¶
Parameters are specified in a JSON array, and are broadly similar to inputs. Here’s an example of the parameter we will define allowing users to specify how much RAM to use in a “samtools sort” operation.
{
"id":"maxMemSort",
"value":{
"default":"500000000",
"order":1,
"required":true,
"type":"number",
"validator":"",
"visible":true
},
"semantics":{
"ontology":[
"xs:integer"
]
},
"details":{
"description":null,
"label":"Maxiumum memory in bytes, used for sorting",
"argument":"-m",
"showArgument":false
}
}
For information on what these fields mean, see the parameters metadata table.
Outputs¶
While we don’t support outputs 100% yet, Tapis apps are designed to participate in workflows. Thus, just as we define the list of valid and required inputs to an app, we also must (when we know them) define a list of its outputs. This allows it to “advertise” to consumers of Tapis services what it expects to emit, allowing apps to be chained together. Note that unlike inputs and parameters, output “id”s are NOT passed to the template file. If you must specify an output filename in the application json, do it as a parameter! Outputs are defined basically the same way as inputs:
{
"id":"bam",
"value":{
"default":"sorted.bam",
"order":0,
"required":false,
"validator":"",
"visible":true
},
"semantics":{
"ontology":[
"http://sswapmeet.sswap.info/mime/application/X-bam"
],
"minCardinality":1,
"fileTypes":[
"raw-0"
]
},
"details":{
"description":"",
"label":"Sorted BAM file",
"argument":null,
"showArgument":false
}
}
For more info on these fields, see Output metadata table.
Craft a shell script template¶
Create sort.template using your test-sort.sh script as the starting point.
cp test-sort.sh sort.template
Now, open sort.template in the text editor of your choice. Delete the bash shebang line and the SLURM pragmas. Replace the hard-coded values for inputs and parameters with variables defined by your app description.
# Set up inputs...
# Since we don't check these when constructing the
# command line later, these will be marked as required
inputBam=${inputBam}
# and parameters
outputPrefix=${outputPrefix}
# Maximum memory for sort, in bytes
# Be careful, Neither Tapis nor scheduler will
# check that this is a reasonable value. In production
# you might want to code min/max for this value
maxMemSort=${maxMemSort}
# Boolean: Sort by name instead of coordinate
nameSort=${nameSort}
# Unpack the bin.tgz file containing samtools binaries
tar -xvf bin.tgz
# Set the PATH to include binaries in bin
export PATH=$PATH:"$PWD/bin"
# Build up an ARGS string for the program
# Start with empty ARGS...
ARGS=""
# Add -m flag if maxMemSort was specified
if [ ${maxMemSort} -gt 0 ]; then ARGS="${ARGS} -m $maxMemSort"; fi
# Boolean handler for -named sort
if [ ${nameSort} -eq 1 ]; then ARGS="${ARGS} -n "; fi
# Run the actual program
samtools sort ${ARGS} $inputBam ${outputPrefix}
# Now, delete the bin/ directory
rm -rf bin
Note
Another example to create a custom app using the tapis-cli can be found at Create a custom App Example
Permissions¶
Apps have fine grained permissions similar to those found in the Jobs and Files services. Using these, you can share your app other Tapis users. App permissions are private by default, so when you first POST your app to the Apps service, you are the only one who can see it. You may share your app with other users by granting them varying degrees of permissions. The full list of app permission values are listed in the following table.
Permission | Description |
---|---|
READ | Gives the ability to view the app description. |
WRITE | Gives the ability to update the app. |
EXECUTE | Gives the ability to submit jobs using the app |
ALL | Gives full READ and WRITE and EXECUTE permissions to the user. |
READ_WRITE | Gives full READ and WRITE permissions to the user |
READ_EXECUTE | Gives full READ and EXECUTE permissions to the user |
WRITE_EXECUTE | Gives full WRITE and EXECUTE permissions to the user |
App permissions are distinct from all other roles and permissions and do not have implications outside the Apps service. This means that if you want to allow someone to run a job using your app, it is not sufficient to grant them READ_EXECUTE permissions on your app. They must also have an appropriate user role on the execution system on which the app will run. Similarly, if you do not have the right to publish on the executionSystem
or access the deploymentPath
on the deploymentSystem
in your app description, you will not be able to publish your app.
Listing permissions¶
App permissions are managed through a set of URLs consistent with the permission operations elsewhere in the API. To query for a user’s permission for an app, perform a GET on the user’s unique app permissions url.
You can use the following CLI command:
tapis apps pems show -v $APP_ID $USERNAME
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://agave.iplantc.org/apps/v2/$APP_ID/pems/$USERNAME?pretty=true
The response from the service will be a JSON object representing the user permission. If the user does not have a permission for that app, the permission value will be NONE. By default, only you have permission to your private apps. Public apps will return a single permission for the public meta user rather than return a permissions for every user.
{
"username": "$USERNAME",
"permission": {
"read": true,
"write": true,
"execute": true
},
"_links": {
"self": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/$USERNAME"
},
"app": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID"
},
"profile": {
"href": "https://agave.iplantc.org/profiles/v2/$USERNAME"
}
}
}
You can also query for all permissions granted on a specific app by making a GET request on the app’s permission collection.
tapis apps pems list -v $APP_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true
This time the service will respond with a JSON array of permission objects.
{
"username": "$USERNAME",
"permission": {
"read": true,
"write": true,
"execute": true
},
"_links": {
"self": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/$USERNAME"
},
"app": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID"
},
"profile": {
"href": "https://agave.iplantc.org/profiles/v2/$USERNAME"
}
}
}
Adding and updating permissions¶
Setting permissions is done by posting a JSON object containing a permission and username. Alternatively, you can POST just the permission and append the username to the URL.
tapis apps pems grant -v $APP_ID bgibson READ
# Standard syntax to grant permissions to a specific user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "username=bgibson&permission=READ" https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true
# Abbreviated POST data to grant permission to a single user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "permission=READ" https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson?pretty=true
The response will contain a JSON object representing the permission that was just created.
{
"username": "bgibson",
"permission": {
"read": true,
"write": false,
"execute": false
},
"_links": {
"self": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson"
},
"app": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID"
},
"profile": {
"href": "https://agave.iplantc.org/profiles/v2/bgibson"
}
}
}
Deleting permissions¶
Permissions can be deleted on a user-by-user basis, or all at once. To delete an individual user permission, make a DELETE request on the user’s app permission URL.
tapis apps pems revoke -v $APP_ID $USERNAME
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson?pretty=true
The CLI response will be:
{
"username": "bgibson",
"permission": {
"read": true,
"write": false,
"execute": false
},
"_links": {
"self": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson"
},
"app": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID"
},
"profile": {
"href": "https://agave.iplantc.org/profiles/v2/bgibson"
}
}
}
Successfully removed permission for bgibson on app $APP_ID
And the cURL response will be an empty result object. |
You can accomplish the same thing by updating the user permission to an empty value.
tapis apps pems grant -v $APP_ID $USERNAME $PERMISSION
# Delete permission for a single user by updating with an empty permission value
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X POST -d "username=bgibson" -d "permission=NONE" \
https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true
# Delete permission for a single user by updating with an empty permission value
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X POST -d "permission=" \
https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson?pretty=true
Since this is an update operation, the resulting JSON permission object will be returned showing the user has no permissions to the app anymore.
{
"username": "bgibson",
"permission": {
"read": false,
"write": false,
"execute": false
},
"_links": {
"self": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID/pems/bgibson"
},
"app": {
"href": "https://agave.iplantc.org/apps/v2/$APP_ID"
},
"profile": {
"href": "https://agave.iplantc.org/profiles/v2/bgibson"
}
}
}
To delete all permissions for an app, make a DELETE request on the app’s permissions collection.
tapis apps pems drop $APP_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X DELETE \
https://agave.iplantc.org/apps/v2/$APP_ID/pems?pretty=true
The response will be an empty result object.
App Publishing¶
In addition to traditional permissions, apps also have a concept of scope. Unless otherwise configured, apps are private to the owner and the users they grant permission. Applications can, however move from the private space into the public space for use any anyone. Moving an app into the public space is called publishing. Publishing an app gives it much greater exposure and results in increased usage by the user community. It also comes with increased responsibilities for the original owner as well as the API administrators. Several of these are listed below:
- Public apps must run on public systems. This makes the app available to everyone.
- Public apps must be vetted for performance, reliability, and security by the API administrators.
- The original app author must remain available via email for ongoing support.
- Public apps must be copied into a public repository and checksummed.
- Updates to public apps must result in a snapshot of the original app being created and stored with its resulting checksum in a separate location.
- API administrators must maintain and support the app throughout its lifetime.
information_source: If you have an app you would like to see published, please contact your API administrators for more information.
Publishing an app¶
To publish an app, make a PUT request on the app resource. In this example, we publish the wc-osg-1.00
app.
tapis apps publish -e condor.opensciencegrid.org wc-osg-1.00
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-H "Content-Type: application/json"
-X PUT
--data-binary '{"action":"publish","executionSystem":"condor.opensciencegrid.org"}'
https://agave.iplantc.org/apps/v2/wc-osg-1.00?pretty=true
The response from the service will resemble the following:
{
"id": "wc-osg-1.00u1",
"name": "wc-osg",
"icon": null,
"uuid": "8734854070765284890-242ac116-0001-005",
"parallelism": "SERIAL",
"defaultProcessorsPerNode": 1,
"defaultMemoryPerNode": 1,
"defaultNodeCount": 1,
"defaultMaxRunTime": null,
"defaultQueue": null,
"version": "1.00",
"revision": 1,
"isPublic": false,
"helpURI": "http://www.gnu.org/s/coreutils/manual/html_node/wc-invocation.html",
"label": "wc condor",
"shortDescription": "Count words in a file",
"longDescription": "",
"tags": [
"gnu",
"textutils"
],
"ontology": [
"http://sswapmeet.sswap.info/algorithms/wc"
],
"executionType": "CONDOR",
"executionSystem": "condor.opensciencegrid.org",
"deploymentPath": "/agave/apps/wc-1.00",
"deploymentSystem": "public.storage.agave",
"templatePath": "/wrapper.sh",
"testPath": "/wrapper.sh",
"checkpointable": true,
"lastModified": "2016-09-15T04:48:17.000-05:00",
"modules": [
"load TACC",
"purge"
],
"available": true,
"inputs": [
{
"id": "query1",
"value": {
"validator": "",
"visible": true,
"required": false,
"order": 0,
"enquote": false,
"default": [
"read1.fq"
]
},
"details": {
"label": "File to count words in: ",
"description": "",
"argument": null,
"showArgument": false,
"repeatArgument": false
},
"semantics": {
"minCardinality": 1,
"maxCardinality": -1,
"ontology": [
"http://sswapmeet.sswap.info/util/TextDocument"
],
"fileTypes": [
"text-0"
]
}
}
],
"parameters": [],
"outputs": [
{
"id": "outputWC",
"value": {
"validator": "",
"order": 0,
"default": "wc_out.txt"
},
"details": {
"label": "Text file",
"description": "Results of WC"
},
"semantics": {
"minCardinality": 1,
"maxCardinality": 1,
"ontology": [
"http://sswapmeet.sswap.info/util/TextDocument"
],
"fileTypes": []
}
}
],
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/apps/v2/wc-osg-1.00u1"
},
"executionSystem": {
"href": "https://api.tacc.utexas.edu/systems/v2/condor.opensciencegrid.org"
},
"storageSystem": {
"href": "https://api.tacc.utexas.edu/systems/v2/public.storage.agave"
},
"history": {
"href": "https://api.tacc.utexas.edu/apps/v2/wc-osg-1.00u1/history"
},
"metadata": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%228734854070765284890-242ac116-0001-005%22%7D"
},
"owner": {
"href": "https://papi.tacc.utexas.edu/profiles/v2/nryan"
},
"permissions": {
"href": "https://api.tacc.utexas.edu/apps/v2/wc-osg-1.00u1/pems"
}
}
}
Notice a few things about the response.
- Both the
executionSystem
anddeploymentSystem
have changed. Public apps must run and store their assets on public systems. - We did not specify the
deploymentSystem
where the public app assets should be stored, so Tapis placed them on the default public storage system,public.storage.agave
. - We did not specify the
deploymentPath
where the public app assets should be stored, so Tapis placed them in thepublicAppsDir
of thedeploymentPath
. - The
deploymentPath
is now a zip archive rather than a folder. Tapis does this because once, published, the app can no longer be updated, so the assets are frozen and stored in a separate location, removed from user access. - The
id
of the app has changed. It now has au1
appended to the original app id. This indicates that it is a public app and that it has been updated a single time. If we were to publish the app again, the resultingid
would bewc-osg-1.00u2
. This differs from unpublished apps whose revision number increments without impacting the app id. Every time you publish an app, the id of the resulting public app will change.
Disabling an App¶
Unpublishing a public system is equivalent to disabling it.
Unlike systems, it is not possible to unpublish an app. Once published, a deep copy of the app is store in an external location with its own provenance trail. If you would like to remove a published app from further use, simply disable it.
tapis apps disable -v $APP_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X PUT -d "action=disable"
https://agave.iplantc.org/apps/v2/$APP_ID?pretty=true
The response will look identical to before, but with available
set to false
Cloning an app¶
Often times you will want to copy an existing app for use on another system, or simply to obtain a private copy of the app for your own use. This can be done using the clone functionality in the Apps service. The following tabs show how to do this using the unix curl
command as well as with the Tapis CLI.
tapis apps clone -n my-pyplot-demo -x 2.2 demo-pyplot-demo-advanced-0.1.0
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X PUT 'https://agave.iplantc.org/apps/v2/$APP_ID?pretty=true' \
--data-urlencode action=clone \
--data-urlencode name=$NEW_APP_NAME \
--data-urlencode version=0.1.2 \
--data-urlencode deploymentSystem= $STORAGE_SYSTEM \
--data-urlencode executionSystem= $EXECUTION_SYSTEM
information_source: When cloning public apps, the entire app bundle will be recreated on the deploymentSystem
you specify or your default storage system. The same is not true for private apps. Cloning a private app will copy the job description, but not the app bundle. This is to honor the original ownership of the assets and prevent them from leaking out to the public space without the owner’s permission. If you need direct access to the app’s assets, request that the owner give you read access to the folder listed as the deploymentPath in the app description.
Jobs¶
The Jobs service is a basic execution service that allows you to run applications registered with the Apps service across multiple, distributed, heterogeneous systems through a common REST interface. The service manages all aspects of execution and job management from data staging, job submission, monitoring, output archiving, event logging, sharing, and notifications. The Jobs service also provides a persistent reference to your job’s output data and a mechanism for sharing all aspects of your job with others. Each feature will be described in more detail in the following section.
Aloe Jobs Service (now in production)¶
Version 2.4 of the Jobs service is now in production. This version, code-named Aloe, is the rearchitected Jobs service with improved reliability, scalability, performance and serviceability.
A new version of the Jobs service documentation is being developed. Until the unified documentation is ready, please see the old Tapis Jobs service documentation for a basic understanding of the interface and the Aloe documentation in the links below for the up-to-date details.
The following links discuss details of the new production Jobs service:
Job submission¶
Job submission is a term recycled from shared batch computing environments where a user would submit a request for a unit of computational work (called a Job) to the batch scheduler, then go head home for dinner while waiting for the computer to complete the job they gave it.
Originally the batch scheduler was a person and the term batch came from their ability to process several submissions together. Later on, as human schedulers were replaced by software, the term stuck even though the process remained unchanged. Today the term job submission is essentially unchanged.
A user submits a request for a unit of work to be done. The primary difference is that today, often times, the wait time between submission and execution is considerably less. On shared systems, such as many of the HPC systems originally targeted by Tapis, waiting for your job to start is the price you pay for the incredible performance you get once your job starts.
Tapis, too, adopts the concept of job submission, though it is not in and of itself a scheduler. In the context of Tapis’ (Tapis) Job service, the process of running an application registered with the Apps service is referred to as submitting a job.
Unlike in the batch scheduling world where each scheduler has its own job submission syntax and its own idiosyncrasies, the mechanism for submitting a job to Tapis is consistent regardless of the application or system on which you run. A HTML form or JSON object are posted to the Jobs service. The submission is validated, and the job is forwarded to the scheduling and execution services for processing.
Because Tapis takes an app-centric view of science, execution does not require knowing about the underlying systems on which an application runs. Simply knowing how the parameters and inputs you want to use when running an app is sufficient to define a job. Tapis will handle the rest.
As mentioned previously, jobs are submitted by making a HTTP POST request either a HTML form or a JSON object to the Jobs service. All job submissions must include a few mandatory values that are used to define a basic unit of work. Table 1 lists the optional and required attributes of all job submissions.
Name | Value(s) | Description |
---|---|---|
name | string | Descriptive name of the job. This will be slugified and used as one component of directory names in certain situations. |
appId | string | The unique name of the application being run by this job. This must be a valid application that the calling user has permission to run. |
batchQueue | string | The batch queue on the execution system to which this job is submitted. Defaults to the app's defaultQueue property if specified. Otherwise a best-fit algorithm is used to match the job parameters to a queue on the execution system with sufficient capabilities to run the job. |
nodeCount | integer | The number of nodes to use when running this job. Defaults to the app's defaultNodes property or 1 if no default is specified. |
processorsPerNode | integer | The number of processors this application should utilize while running. Defaults to the app's defaultProcessorsPerNode property or 1 if no default is specified. If the application is not of executionType PARALLEL, this should be 1. |
memoryPerNode | string | The maximum amount of memory needed per node for this application to run given in ####.#[E|P|T|G]B format. Defaults to the app's defaultMemoryPerNode property if it exists. GB are assumed if no magnitude is specified. |
maxRunTime | string | The estimated compute time needed for this application to complete given in hh:mm:ss format. This value must be less than or equal to the max run time of the queue to which this job is assigned. |
notifications* | JSON array | An array of one or more JSON objects describing an event and url which the service will POST to when the given event occurs. For more on Notifications, see the section on webhooks below. |
archive* | boolean | Whether the output from this job should be archived. If true, all new files created by this application's execution will be archived to the archivePath in the user's default storage system. |
archiveSystem* | string | System to which the job output should be archived. Defaults to the user's default storage system if not specified. |
archivePath* | string | Location where the job output should be archived. A relative path or absolute path may be specified. If not specified, a unique folder will be created in the user's home directory of the archiveSystem at 'archive/jobs/job-$JOB_ID' |
Table 1. The optional and required attributes common to all job submissions. Optional fields are marked with an astericks.
Note
In this tutorial we will use JSON for our examples, however, one could replace the JSON object with a HTML form mapping JSON attribute and values to HTML form attributes and values one for one and get the same results, with the exception of the notifications
attribute which is not accepted using HTML form submission and would need to be added after submitting the job request by sending each of the notification objects with the returned job id to the Notifications API.
In addition to the standard fields for all jobs, the application you specify in the appId
field will also have its own set of inputs and parameters specified during registration that are unique to that app. (For more information about app registration and descriptions, see the Apps section..
The following snippet shows a sample JSON job request that could be submitted to the Jobs service to run the pyplot-0.1.0
app. from the Advanced App Example tutorial.
{
"name":"pyplot-demo test",
"appId":"demo-pyplot-demo-advanced-0.1.0",
"inputs":{
"dataset":[
"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv",
"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata2.csv"
]
},
"archive":false,
"parameters":{
"unpackInputs":false,
"chartType":[
"bar",
"line"
],
"width":1024,
"height":512,
"background":"#d96727",
"showYLabel":true,
"ylabel":"The Y Axis Label",
"showXLabel":true,
"xlabel":"The X Axis Label",
"showLegend":true,
"separateCharts":false
},
"notifications":[
{
"url":"$API_EMAIL",
"event":"RUNNING"
},
{
"url":"$API_EMAIL",
"event":"FINISHED"
},
{
"url":"http://http://requestbin.agaveapi.co/o1aiawo1?job_id=${JOB_ID}&status=${JOB_STATUS}",
"event":"*",
"persistent":true
}
]
}
Notice that this example specifies a single input attribute, dataset
. The pyplot-0.1.0
app definition specified that the dataset
input attribute could accept more than one value (maxCardinality = 2). In the job request object, that translates to an array of string values. Each string represents a piece of data that Tapis will transfer into the job work directory prior to job execution. Any value accepted by the Files service when importing data is accepted here. Some examples of valid values are given in the following table.
Name | Description |
---|---|
inputs/pyplot/testdata.csv | A relative path on the user's default storage system. |
/home/apiuser/inputs/pyplot/testdata.csv | An absolute path on the user's default storage system. |
agave://$PUBLIC_STORAGE_SYSTEM/ $API_USERNAME/inputs/pyplot/testdata.csv | a Tapis URL explicitly specifying a source system and relative path. |
agave://$PUBLIC_STORAGE_SYSTEM//home/ apiuser/$API_USERNAME/inputs/pyplot/testdata.csv | a Tapis URL explicitly specifying a source system and absolute path. |
http://example.com/inputs/pyplot/testdata.csv | Standard url with any supported transfer protocol. |
Table 2. Examples of different syntaxes that input values can be specified in the job request object. Here we assume that the validator for the input field is such that these would pass.
The example job request also specifies parameters
object with the parameters defined in the pyplot-0.1.0
app description. Notice that the parameter type
value specified in the app description is reflected here. Numbers are given as numbers, not strings. Boolean and flag attributes are given as boolean true and false values. As with the input section, there is also a parameter chartType
that accepts multiple values. In this case that translates to an array of string value. Had the parameter type required another primary type, that would be used in the array instead.
Finally, we see a notifications
array specifying that we want Tapis send three notifications related to this job. The first is a one-time email when the job starts running. The second is a one-time email when the job reaches a terminal state. The third is a webhook to the url we specified. More on notifications in the section on monitoring below.
Job submission validation¶
To get a template for the Job submission JSON for a particular app, you can use the following CLI command:
$ jobs-template $APP_ID > job.json
You can submit the job with the following CLI command:
$ tapis jobs submit -F job.json
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "@job.json" -H "Content-Type: application/json" https://agave.iplantc.org/jobs/v2/?pretty=true
If everything went well, you will receive a response that looks something like the following JSON object.
{
"status" : "success",
"message" : null,
"version" : "2.2.14-red7223e",
"result" : {
"id" : "$JOB_ID",
"name" : "$USERNAME-$APP_ID",
"owner" : "$USERNAME",
"appId" : "$APP_ID",
"executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
"batchQueue" : "normal",
"nodeCount" : 1,
"processorsPerNode" : 16,
"memoryPerNode" : 32.0,
"maxRunTime" : "01:00:00",
"archive" : false,
"retries" : 0,
"localId" : null,
"created" : "2018-01-26T15:01:44.000-06:00",
"lastModified" : "2018-01-26T15:01:45.000-06:00",
"outputPath" : null,
"status" : "PENDING",
"submitTime" : "2018-01-26T15:01:44.000-06:00",
"startTime" : null,
"endTime" : null,
"inputs" : {
"inputBam" : [ "agave://data.iplantcollaborative.org/shared/iplantcollaborative/example_data/Samtools_mpileup/ex1.bam" ]
},
"parameters" : {
"nameSort" : true,
"maxMemSort" : 800000000
},
"_links" : {
"self" : {
"href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007"
},
"app" : {
"href" : "https://agave.iplantc.org/apps/v2/$APP_ID"
},
"executionSystem" : {
"href" : "https://agave.iplantc.org/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
},
"archiveSystem" : {
"href" : "https://agave.iplantc.org/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
},
"archiveData" : {
"href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007/outputs/listings"
},
"owner" : {
"href" : "https://agave.iplantc.org/profiles/v2/$USERNAME"
},
"permissions" : {
"href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007/pems"
},
"history" : {
"href" : "https://agave.iplantc.org/jobs/v2/1674389564419740136-242ac113-0001-007/history"
},
"metadata" : {
"href" : "https://agave.iplantc.org/meta/v2/data/?q=%7B%22associationIds%22%3A%221674389564419740136-242ac113-0001-007%22%7D"
},
"notifications" : {
"href" : "https://agave.iplantc.org/notifications/v2/?associatedUuid=1674389564419740136-242ac113-0001-007"
},
"notification" : [ ]
}
}
}
Job monitoring¶
Once you submit your job request, the job will be handed off to Tapis’s back end execution service. Your job may run right away, or it may wait in a batch queue on the execution system until the required resources are available. Either way, the execution process occurs completely asynchronous to the submission process. To monitor the status of your job, Tapis supports two different mechanisms: polling and webhooks.
information_source: For the sake of brevity, we placed a detailed explanation of the job lifecycle in a separate, aptly title post, The Job Lifecycle. There you will find detailed information about how, when, and why everything moves from place to place and how you can peek behind the curtains.
Polling¶
If you have ever taken a long road trip with children, you are probably painfully aware of how polling works. Starting several minutes from the time you leave the house, a child asks, “Are we there yet? You reply, “No.” Several minutes later the child again asks, “Are we there yet?” You again reply, “No.” This process continues until you finally arrive at your destination. This is called polling and polling is bad
Polling for your job status works the same way. After submitting your job, you start a while loop that queries the Jobs service for your job status until it detects that the job is in a terminal state. The following two URLs both return the status of your job. The first will result in a list of abbreviated job descriptions, the second will result in a full description of the job with the given $JOB_ID, exactly like that returned when submitting the job. The third will result in a much smaller response object that contains only the $JOB_ID and status being returned.
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/?pretty=true
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/$JOB_ID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/$JOB_ID/status
{
"id" : "$JOB_ID",
"name" : "$USERNAME-$APP_ID",
"owner" : "$USERNAME",
"appId" : "$APP_ID",
"executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
"batchQueue": "normal",
"nodeCount": 1,
"processorsPerNode": 16,
"memoryPerNode": 32,
"maxRunTime": "01:00:00",
"archive": false,
"retries": 0,
"localId": "659413",
"created": "2018-01-26T15:08:02.000-06:00",
"lastUpdated": "2018-01-26T15:09:55.000-06:00",
"outputPath": "$USERNAME/$JOB_ID-$APP_ID",
"status": "FINISHED",
"submitTime": "2018-01-26T15:09:45.000-06:00",
"startTime": "2018-01-26T15:09:53.000-06:00",
"endTime": "2018-01-26T15:09:55.000-06:00",
"inputs": {
"inputBam": [
"agave://data.iplantcollaborative.org/shared/iplantcollaborative/example_data/Samtools_mpileup/ex1.bam"
]
},
"parameters": {
"nameSort": true,
"maxMemSort": 800000000
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
},
"app": {
"href": "https://api.tacc.utexas.edu/apps/v2/$APP_ID"
},
"executionSystem": {
"href": "https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
},
"archiveSystem": {
"href": "https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM""
},
"archiveData": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/outputs/listings"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
},
"permissions": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems"
},
"history": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/history"
},
"metadata": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/?q=%7B%22associationIds%22%3A%22462259152402771480-242ac113-0001-007%22%7D"
},
"notifications": {
"href": "https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=$JOB_ID"
}
}
}
The list of all possible job statuses is given in table 2.
Event | Description |
---|---|
CREATED | The job was updated |
UPDATED | The job was updated |
DELETED | The job was deleted |
PERMISSION_GRANT | User permission was granted |
PERMISSION_REVOKE | Permission was removed for a user on this job |
PENDING | Job accepted and queued for submission. |
STAGING_INPUTS | Transferring job input data to execution system |
CLEANING_UP | Job completed execution |
ARCHIVING | Transferring job output to archive system |
STAGING_JOB | Job inputs staged to execution system |
FINISHED | Job complete |
KILLED | Job execution killed at user request |
FAILED | Job failed |
STOPPED | Job execution intentionally stopped |
RUNNING | Job started running |
PAUSED | Job execution paused by user |
QUEUED | Job successfully placed into queue |
SUBMITTING | Preparing job for execution and staging binaries to execution system |
STAGED | Job inputs staged to execution system |
PROCESSING_INPUTS | Identifying input files for staging |
ARCHIVING_FINISHED | Job archiving complete |
ARCHIVING_FAILED | Job archiving failed |
HEARTBEAT | Job heartbeat received |
Table 2. Job statuses listed in progressive order from job submission to completion.
Polling is an incredibly effective approach, but it is bad practice for two reasons. First, it does not scale well. Querying for one job status every few seconds does not take much effort, but querying for 100 takes quite a bit of time and puts unnecessary load on Tapis’s servers. Second, polling provides what is effectively a binary response. It tells you whether a job is done or not done, it does not give you any information on what is actually going on with the job or where it is in the overall execution process.
The job history URL provides much more detailed information on the various state changes, system messages, and progress information associated with data staging. The syntax of the job history URL is as follows:
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://agave.iplantc.org/jobs/v2/$JOB_ID/history?pretty=true
{
"status":"success",
"message":null,
"version":"2.1.0-r6d11c",
"result":[
{
"created":"2014-10-24T04:47:45.000-05:00",
"status":"PENDING",
"description":"Job accepted and queued for submission."
},
{
"created":"2014-10-24T04:47:47.000-05:00",
"status":"PROCESSING_INPUTS",
"description":"Attempt 1 to stage job inputs"
},
{
"created":"2014-10-24T04:47:47.000-05:00",
"status":"PROCESSING_INPUTS",
"description":"Identifying input files for staging"
},
{
"created":"2014-10-24T04:47:48.000-05:00",
"status":"STAGING_INPUTS",
"description":"Staging agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv to remote job directory"
},
{
"progress":{
"averageRate":0,
"totalFiles":1,
"source":"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv",
"totalActiveTransfers":0,
"totalBytes":3212,
"totalBytesTransferred":3212
},
"created":"2014-10-24T04:47:48.000-05:00",
"status":"STAGING_INPUTS",
"description":"Copy in progress"
},
{
"created":"2014-10-24T04:47:50.000-05:00",
"status":"STAGED",
"description":"Job inputs staged to execution system"
},
{
"created":"2014-10-24T04:47:55.000-05:00",
"status":"SUBMITTING",
"description":"Preparing job for submission."
},
{
"created":"2014-10-24T04:47:55.000-05:00",
"status":"SUBMITTING",
"description":"Attempt 1 to submit job"
},
{
"created":"2014-10-24T04:48:08.000-05:00",
"status":"RUNNING",
"description":"Job started running"
},
{
"created":"2014-10-24T04:48:12.000-05:00",
"status":"CLEANING_UP"
},
{
"created":"2014-10-24T04:48:15.000-05:00",
"status":"FINISHED",
"description":"Job completed. Skipping archiving at user request."
}
]
}
Depending on the nature of your job and the reliability of the underlying systems, the response from this service can grow rather large, so it is important to be aware that this query can be an expensive call for your client application to make. Everything we said before about polling job status applies to polling job history with the additional caveat that you can chew through quite a bit of bandwidth polling this service, so keep that in mind if your application is bandwidth starved.
Often times, however, polling is unavoidable. In these situations, we recommend using an exponential backoff to check job status. An exponential backoff is an alogrithm that increases the time between retries as the number of failures increases.
Webhooks¶
Webhooks are the alternative, preferred way for your application to monitor the status of asynchronous actions in Tapis. If you are a Gang of Four disciple, webhooks are a mechanism for implementing the Observer Pattern. They are widely used across the web and chances are that something you’re using right now is leveraging them. In the context of Tapis, a webhook is a URL that you give to Tapis in advance of an event which it later POSTs a response to when that event occurs. A webhook can be any web accessible URL.
information_source: For more information about webhooks, events, and notifications in Tapis, please see the Notifications and Events Guides.
The Jobs service provides several template variables for constructing dynamic URLs. Template variables can be included anywhere in your URL by surrounding the variable name in the following manner ${VARIABLE_NAME}
. When an event of interest occurs, the variables will be resolved and the resulting URL called. Several example urls are given below.
The full list of template variables are listed in the following table.
Variable | Description |
---|---|
UUID | The UUID of the job |
EVENT | The event which occurred |
JOB_STATUS | The status of the job at the time the event occurs |
JOB_URL | The url of the job within the API |
JOB_ID | The unique id used to reference the job within Tapis. |
JOB_SYSTEM | ID of the job execution system (ex. ssh.execute.example.com) |
JOB_NAME | The user-supplied name of the job |
JOB_START_TIME | The time when the job started running in ISO8601 format. |
JOB_END_TIME | The time when the job stopped running in ISO8601 format. |
JOB_SUBMIT_TIME | The time when the job was submitted to Tapis for execution by the user in ISO8601 format. |
JOB_ARCHIVE_PATH | The path on the archive system where the job output will be staged. |
JOB_ARCHIVE_URL | The Tapis URL for the archived data. |
JOB_ERROR | The error message explaining why a job failed. Null if completed successfully. |
Table 3. Template variables available for use when defining webhooks for your job.
Email¶
In situations where you do not have a persistent web address, or access to a backend service, you may find it more convenient to subscribe for email notifications rather then providing a webhook. Tapis supports email notifications as well. Simply specify a valid email address in the url
field in your job submission notification object and an email will be sent to that address when a relevant event occurs. A sample email message is given below.
Stopping¶
Once your job is submitted, you have the ability to stop the job. This will kill the job on the system on which it is running.
You can kill a job with the following CLI command:
tapis jobs cancel $JOB_UUID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "action=kill" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID
{
"id" : "$JOB_ID",
"name" : "demo-pyplot-demo-advanced test-1414139896",
"owner" : "$API_USERNAME",
"appId" : "demo-pyplot-demo-advanced-0.1.0",
"executionSystem" : "$PUBLIC_EXECUTION_SYSTEM",
"batchQueue" : "debug",
"nodeCount" : 1,
"processorsPerNode" : 1,
"memoryPerNode" : 1.0,
"maxRunTime" : "01:00:00",
"archive" : false,
"retries" : 0,
"localId" : "10321",
"outputPath" : null,
"status" : "STOPPED",
"submitTime" : "2014-10-24T04:48:11.000-05:00",
"startTime" : "2014-10-24T04:48:08.000-05:00",
"endTime" : null,
"inputs" : {
"dataset" : "agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
},
"parameters" : {
"chartType" : "bar",
"height" : "512",
"showLegend" : "false",
"xlabel" : "Time",
"background" : "#FFF",
"width" : "1024",
"showXLabel" : "true",
"separateCharts" : "false",
"unpackInputs" : "false",
"ylabel" : "Magnitude",
"showYLabel" : "true"
},
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
},
"app" : {
"href" : "https://api.tacc.utexas.edu/apps/v2/demo-pyplot-demo-advanced-0.1.0"
},
"executionSystem" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
},
"archiveData" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
},
"owner" : {
"href" : "https://api.tacc.utexas.edu/profiles/v2/$API_USERNAME"
},
"permissions" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
},
"history" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
},
"metadata" : {
"href" : "https://api.tacc.utexas.edu/meta/v2/data/?q={"associationIds":"0001414144065563-5056a550b8-0001-007"}"
},
"notifications" : {
"href" : "https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
}
}
}
Deleting a job¶
Over time the number of jobs you have run can grow rather large. You can delete jobs to remove them from your listing results, with the following CLI command:
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X DELETE https://api.tacc.utexas.edu/jobs/v2/$JOB_ID
warning: Deleting a job will hide it from view, not permanently delete the record.
Resubmitting a job¶
Often times you will want to rerun a previous job as part of a pipeline, automation, or validation that the results were valid. In this situation, it is convenient to use the resubmit
feature of the Jobs service.
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "action=resubmit" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID
Resubmission provides you the options to enforce as much or as little rigor as you desire with respect to reproducibility in the job submission process. The following options are available to you for configuring a resubmission according to your requirements.
Field | Type | Description |
---|---|---|
ignoreInputConflicts | boolean | Whether to ignore discrepencies in the previous app inputs for the resubmitted job. If true, the resubmitted job will make a best fit attempt and migrating the inputs. |
ignoreParameterConflicts | boolean | Whether to ignore discrepencies in the previous app parameters for the resubmitted job. If true, the resubmitted job will make a best fit attempt and migrating the parameters. |
preserveNotifications | boolean | Whether to recreate the notification of the original job for the resubmitted job. |
Outputs¶
Throughout the lifecycle of a job, your inputs, application assets, and outputs are copied from and shuffled between several different locations. Though it is possible in many instances to explicitly locate and view all the moving pieces of your job through the Files service, resolving where those pieces are given the status, execution system, storage systems, data protocols, login protocols, and execution mechanisms of your job at a given time is…challenging. It is important, however, that you have the ability to monitor your job’s output throughout the lifetime of the job.
To make tracking the output of a specific job easier to do, the Jobs service provides a special URL for referencing individual job outputs
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/outputs/listings/?pretty=true
The syntax of this service is consistent with the Files service syntax, as is the JSON response from the service. The response would be similar to the following:
{
"status" : "success",
"message" : null,
"version" : "2.1.0-r6d11c",
"result" : [ {
"name" : "output",
"path" : "/output",
"lastModified" : "2014-11-06T13:34:35.000-06:00",
"length" : 0,
"permission" : "NONE",
"mimeType" : "text/directory",
"format" : "folder",
"type" : "dir",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/output"
},
"system" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
},
"parent" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
}
}
}, {
"name" : "demo-pyplot-demo-advanced-test-1414139896.err",
"path" : "/demo-pyplot-demo-advanced-test-1414139896.err",
"lastModified" : "2014-11-06T13:34:27.000-06:00",
"length" : 442,
"permission" : "NONE",
"mimeType" : "application/octet-stream",
"format" : "unknown",
"type" : "file",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.err"
},
"system" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
},
"parent" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
}
}
}, {
"name" : "demo-pyplot-demo-advanced-test-1414139896.out",
"path" : "/demo-pyplot-demo-advanced-test-1414139896.out",
"lastModified" : "2014-11-06T13:34:30.000-06:00",
"length" : 1396,
"permission" : "NONE",
"mimeType" : "application/octet-stream",
"format" : "unknown",
"type" : "file",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.out"
},
"system" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
},
"parent" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
}
}
}, {
"name" : "demo-pyplot-demo-advanced-test-1414139896.pid",
"path" : "/demo-pyplot-demo-advanced-test-1414139896.pid",
"lastModified" : "2014-11-06T13:34:33.000-06:00",
"length" : 6,
"permission" : "NONE",
"mimeType" : "application/octet-stream",
"format" : "unknown",
"type" : "file",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/demo-pyplot-demo-advanced-test-1414139896.pid"
},
"system" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
},
"parent" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
}
}
}, {
"name" : "testdata.csv",
"path" : "/testdata.csv",
"lastModified" : "2014-11-06T13:34:42.000-06:00",
"length" : 3212,
"permission" : "NONE",
"mimeType" : "application/octet-stream",
"format" : "unknown",
"type" : "file",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/media/testdata.csv"
},
"system" : {
"href" : "https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
},
"parent" : {
"href" : "https://api.tacc.utexas.edujobs/v2/0001414144065563-5056a550b8-0001-007"
}
}
} ]
}
To download a file you would use the following syntax
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/outputs/media/$PATH
information_source: The Jobs output service follows the same conventions of the Files service. Thus, you may specify a range header to retrieve a specific byte range. This is particularly helpful when tracking job progress since it gives you a mechanism to tail the output and error log files.
Regardless of job status, the above services will always point to the most recent location of the job data. If you choose for the Jobs service to archive your job after completion, the URL will point to the archive folder of the job. If you do not choose to archive your data, or if archiving fails, the URL will point to the execution folder created for your job at runtime. Because Tapis does not own any of the underlying hardware, it cannot guarantee that those locations will always exist. If, for example, the execution system enforces a purge policy, the output data may be deleted by the system administrators. Tapis will let you know if the data is no longer present, however, it cannot prevent it from being deleted. This is another reason that it is important to archive data you feel will be needed in the future.
Job Lifecycle Management¶
Tapis handles all of the end-to-end details involved with managing a job lifecycle for you. This can seem like black magic at times, so here we detail the overall lifecycle process every job goes through.
- Job request is made, validated, and saved.
- Job is queued up for execution. Job stays in a pending state until there are resources to run the job. This means that the target execution system is online, the storage system with the app assets is online, and neither the user nor the system are over quota. a) Resource do not become available with 7 days, the job is killed. b) Resources are available, the job moves on.
- When resources are available to run the job on the execution system, a work directory is created on the execution system. The job work directory is created based on the following logic:
if (executionSystem.scratchDir exists)
then
$jobDir = executionSystem.scratchDir
else if (executionSystem.workDir exists)
then
$jobDir = system.workDir
else
$jobDir = system.storage.homeDir
endif
$jobDir = $jobDir + "/" + job.owner + "/job-" + job.uuid
- The job inputs are staged to the job work directory, job status is updated to "INPUTS_STAGING" a) All inputs succeed and the job is updated to "STAGED" b) One or more inputs fail to transfer. Job status is set back to "PENDING" and staging will be attempted up to 2 more times. c. User does not have permission to access one or more inputs. The job is set to "FAILED" and exists.
- The job again waits until the resources are available to run the job. Usually this is immediately after the inputs finish staging. a) Resource do not become available with 7 days, the job is killed. b) Resources are available, the job moves on.
- The app deploymentPath is copied from the app.deploymentSystem to a temp dir on the API server. The jobs API then processes the app.deploymentDir + "/" + app.templatePath file to create the .ipcexe file. The process goes as follows:
- Script headers are written. This includes scheduler directives if a batch system, shbang if a forked app.
- Additional executionSystem[job.batchQueue].customDirectives are written
- "RUNNING" callback written
- Module commands are written
- executionSystem.environment is written
- wrapper script is filtered
- blacklisted commands are removed
- app parameter template variables are resolved against job parameter values.
- app input template variables are resolved against job input values
- blacklisted commands are removed again
- "CLEANING_UP" callback written
- All template macros are resolved.
- job.name.slugify + ".ipcexe" file written to temp directory
- App assets with wrapper template are copied to remote job work directory.
- Directory listing of job work directory is written to a .agave.archive manifest file in the remote job work directory.
- Command line is generated to invoke the *.ipcexe file by the appropriate method for the execution system.
- Command line is run on the remote system. a. The command succeeds and the scheduler/process/job id is captured and stored with the job record. b. The command fails, return the job to "STAGED" status and try up to 2 more times.
- Job is updated to "QUEUED"
- Job waits for a "RUNNING" callback and adds a background process to monitor the job in case the callback never comes.
- Callback checks the job status according the the following schedule:
* every 30 seconds for the first 5 minutes
* every minute for the next 30 minutes
* every 5 minutes for the next hour
* every 15 minutes for the next 12 hours
* every 30 minutes for the next 24 hours
* every hour for the next 14 days
Job either calls back with a "CLEANING_UP" status update or the monitoring process discovers the job no longer exists on the remote system.
- If job.archive is true, send job to archiving queue to stage outputs to job.archiveSystem
- Resource do not become available with 7 days, the job is killed.
- Resources are available, the job moves on.
- Read the .agave.archive manifest file from the job work directory
- Begin a breadth first directory traversal of the job work directory
- If a file/folder is not in the .agave.archive manifest, copy it to the job.archivePath on the job.archiveSystem
- Delete the job work directory
- Update job status to "FINISHED"
Jobs Permissions and Sharing¶
As with the Systems, Apps, and Files services, your jobs have their own set of access controls. Using these, you can share your job and its data with other Tapis users. Job permissions are private by default. The permissions you give a job apply both to the job, its outputs, its metadata, and the permissions themselves. Thus, by sharing a job with another user, you share all aspects of that job.
Job permissions are managed through a set of URLs consistent with the permissions URL elsewhere in the API.
Granting¶
Granting permissions is simply a matter of issuing a POST
with the desired permission object to the job’s pems
collection.
tapis jobs pems grant $JOB_UUID $USERNAME $PERMISSION
# General grant
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST --data-binary '{"permission":"READ","username":"$USERNAME"}' \
https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems
# Custom url grant
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST --data-binary '{"permission":"READ"}' \
https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME
{
"username": "$USERNAME",
"internalUsername": null,
"permission": {
"read": true,
"write": false
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME"
},
"parent": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
}
}
}
The available permission values are listed in Table 2.
Permission | Description |
---|---|
READ | Gives the ability to view the job status, and output data. |
WRITE | Gives the ability to perform actions, manage metadata, and set permissions. |
ALL | Gives full READ and WRITE permissions to the user. |
READ_WRITE | Synonymous to ALL. Gives full READ and WRITE permissions to the user |
Table 2. Supported job permission values.
Job permissions are distinct from file permissions. In many instances, your job output will be accessible via the Files and Jobs services simultaneously. Granting a user permissions to a job output file through the Files services does not alter the accessibility of that file through the Jobs service. It is important, then, that you consider to whom you grant permissions, and the implications of that decision in all areas of your application.
Listing¶
To find the permissions for a given job, make a GET on the job’s pems
collection. Here we see that both the job owner and the user we just granted permission to appear in the response.
tapis jobs pems list -V $JOB_UUID
curl -sk -H "Authorization: Bearer $AUTH_TOKEN" \
'https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/'
[
{
"username": "$API_USERNAME",
"internalUsername": null,
"permission": {
"read": true,
"write": true
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007/pems/$API_USERNAME"
},
"parent": {
"href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/$API_USERNAME"
}
}
},
{
"username": "$USERNAME",
"internalUsername": null,
"permission": {
"read": true,
"write": false
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME"
},
"parent": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
}
}
}
]
Updating¶
Updating is exactly like granting permissions. Just POST to the same job’s pems
collection.
tapis jobs pems grant $USERNAME $PERMISSION $JOB_UUID
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-X POST --data-binary {"permission":"READ_WRITE}" \
https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/$USERNAME
{
"username": "$USERNAME",
"internalUsername": null,
"permission": {
"read": true,
"write": true
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/pems/$USERNAME"
},
"parent": {
"href": "https://api.tacc.utexas.edu/jobs/v2/$JOB_ID"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/$USERNAME"
}
}
}
Deleting¶
To delete a permission, you can issue a DELETE request on the user permission resource we’ve been using, or update with an empty permission value.
tapis jobs pems revoke $JOB_UUID $USERNAME
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X DELETE \
https://api.tacc.utexas.edu/jobs/v2/$JOB_ID/$USERNAME
Notifications¶
Under the covers, the Tapis API is an event-driven distributed system implemented on top of a reliable, cloud-based messaging system. This means that every action either observed or taken by Tapis is tied to an event. The changing of a job from one status to another is an event. The granting of permissions on a file is an event. Editing a piece of metadata is an event, and to be sure, the moment you created an account with Tapis was an event. You get the idea.
Having such a fine-grain event system is helpful for the same reason that having a fine-grain permission model is helpful. It affords you the highest degree of flexibility and control possible to achieve the behavior you desire. With Tapis’s event system, you have the ability to alert your users (or yourself) the instant something occurs. You can be proactive rather than reactive, and you can begin orchestrating your complex tasks in a loosely coupled, asynchronous way.
Subscriptions¶
As consumers of Tapis, you have the ability to subscribe to events occurring on any resource to which you have access. By that we mean, for example, you could subscribe to events on your job and a job that someone shared with you, but you could not subscribe to events on a job submitted by someone else who has not shared the job with you. Basically, if you can see a resource, you can subscribe to its events.
The Notifications service is the primary mechanism by which you create and manage your event subscriptions. A typical use case is a user subscribing for an email alert when her job completes. The following JSON object represents a request for such a notification.
Example notification subscription request
{
"associatedUuid": "0001409758089943-5056a550b8-0001-002",
"event": "OVERWRITTEN",
"persistent": true,
"url": "nryan@rangers.mlb.com"
}
The associatedUuid
value is the UUID of her job. Here, we given the UUID of the picsumipsum.txt
file we uploaded in the Files Guide. The event
value is the name of the event to which she wants to be notified. This example is asking for an email to be sent whenever the file is overwritten. She could have just as easily specified a status of DELETED or RENAME to be notified when the file was deleted or renamed.
The persistent
value specifies whether the notification should fire more than once. By default, all event subscriptions are transient. This is because the events themselves are transient. An event occurs, then it is over. There are, however, many situations where events could occur over and over again. Permission events, changes to metadata and data, application registrations on a system, job submissions to a system or queue, etc., all are transient events that can potentially occur many, many times. In these cases it is either not possible or highly undesirable to constantly resubscribe for the same event. The persistent attribute tells the notification service to keep a subscription alive until it is explicitly deleted.
information_source: In certain situations you may wish to subscribe to multiple events. You are free to add as many subscriptions as you wish, however in the event that you want to subscribe to all possible events for a given resource, use the wildcard value, *
, as the event. This tells the Notifications service that you wanted to be notified of every event for that resource.information_source: A listing of all Tapis’s resource-level events, grouped by resource, can be found in the Events section.
Continuing to work through the example, the url
value specifies where the notification should be sent. In this example, our example user specified that she would like to be notified via email. Tapis supports both email and webhook notifications. If you are unfamiliar with webhooks, take a moment to glance at the webhooks.org page for a brief overview. If you are a Gang of Four disciple, webhooks are a mechanism for implementing the Observer Pattern. Webhooks are widely used across the web and chances are that something you’re using right now is leveraging them.
URL Macros¶
In the context of Tapis, a webhook is a URL to which Tapis will send a POST request when that event occurs. A webhook can be any web accessible URL. While you cannot customize the POST content that Tapis sends (it is unique to the event), you can take advantage of the many template variables that Tapis provides to customize the URL at run time. The following tables show the webhook template variables available for each resource. Use the select box to view the macros for different resources.
Variable | Description |
---|---|
UUID | The UUID of the app. |
EVENT | The event which occurred |
APP_ID | The application id (ex. sabermetrics-2.1) |
The value of webhook template variables is that they allow you to build custom callbacks using the values of the resource variable at run time. Several commonly used webhooks are shown in the tables above.
Receive a callback when a new user is created that includes the new user’s information
https://example.com/sendWelcome.php?username=${USERNAME}&email=${EMAIL}&firstName=${FIRST_NAME}&lastName=${LAST_NAME}&src=api.tacc.utexas.edu&nonce=1234567
Receive self-describing job status updates
http://example.com/job/${JOB_ID}?status=${STATUS}&lastUpdated=${LAST_UPDATED}
Get notified on all jobs going into and out of queues
http://example.com/system/${EXECUTION_SYSTEM}/queue/${QUEUE}?action=add
http://example.com/system/${EXECUTION_SYSTEM}/queue/${QUEUE}?action=subtract
Rerun an analysis when a files finishes staging
https://$TAPIS_BASE_URL/jobs/v2/a32487q98wasdfa9-09090b0b-007?action=resubmit
Use plus mailing to route job notifications to different folders
nryan+${EXECUTION_SYSTEM}+${JOB_ID}@gmail.com
Creating¶
Create a new notification subscription with the following CLI command:
tapis notifications create -F notification.json
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST \
-H "Content-Type: application/json" \
--data-binary '{"associatedUuid": "7554973644402463206-242ac114-0001-007", "event": "FINISHED", "url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=>{SYSTEM}&event=${EVENT}" }' \
https://api.tacc.utexas.edu/notifications/v2?pretty=true
{
"id": "7612526206168863206-242ac114-0001-011",
"owner": "nryan",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "FINISHED",
"responseCode": null,
"attempts": 0,
"lastSent": null,
"success": false,
"persistent": false,
"status": "ACTIVE",
"lastUpdated": "2016-08-24T10:07:03.000-05:00",
"created": "2016-08-24T10:07:03.000-05:00",
"policy": {
"retryLimit": 5,
"retryRate": 5,
"retryDelay": 0,
"saveOnFailure": true,
"retryStrategy": "NONE"
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011"
},
"history": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/history"
},
"attempts": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/attempts"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"job": {
"href": "https://api.tacc.utexas.edu/jobs/v2/7554973644402463206-242ac114-0001-007"
}
}
}
Updating¶
Updating a subscription is done identically to creation except that the form or JSON is POSTed to the existing subscription URL. An example of doing this using curl as well as the CLI is given below.
The updated notification subscription object:
{
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "*",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}"
}
CLI command to update subscription, using the above JSON:
tapis notifications create -F notification.json 2699130208276770330-242ac114-0001-011
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST \
-H "Content-Type: application/json" \
-F "fileToUpload=@notification.json" \
https://api.tacc.utexas.edu/notifications/v2/2699130208276770330-242ac114-0001-011
{
"id": "7612526206168863206-242ac114-0001-011",
"owner": "nryan",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "*",
"responseCode": null,
"attempts": 0,
"lastSent": null,
"success": false,
"persistent": false,
"status": "ACTIVE",
"lastUpdated": "2016-08-24T10:07:03.000-05:00",
"created": "2016-08-24T10:07:03.000-05:00",
"policy": {
"retryLimit": 5,
"retryRate": 5,
"retryDelay": 0,
"saveOnFailure": true,
"retryStrategy": "NONE"
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011"
},
"history": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/history"
},
"attempts": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011/attempts"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"job": {
"href": "https://api.tacc.utexas.edu/jobs/v2/7554973644402463206-242ac114-0001-007"
}
}
}
Listing¶
You can get a list of your current notification subscriptions by performing a GET operation on the base /notifications collection. Adding the UUID of a notification will return just that notification. You can also query for all notifications assigned to a specific UUID by adding associatedUuid=$uuid
. An example of querying all notifications using curl as well as the CLI is given below.
List all notificaiton subscriptions with the following CLI command:
tapis notifications list -v
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/notifications/v2/2699130208276770330-242ac114-0001-011
[
{
"id": "7612526206168863206-242ac114-0001-011",
"url": "http://requestbin.agaveapi.co/zyiomxzy?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
"associatedUuid": "7554973644402463206-242ac114-0001-007",
"event": "*",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7612526206168863206-242ac114-0001-011"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"job": {
"href": "https://api.tacc.utexas.edu/jobs/v2/7554973644402463206-242ac114-0001-007"
}
}
},
{
"id": "7404907487080223206-242ac114-0001-011",
"url": "nryan@rangers.texas.mlb.com",
"associatedUuid": "6904887394479903206-242ac114-0001-007",
"event": "FINISHED",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/notifications/v2/7404907487080223206-242ac114-0001-011"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"job": {
"href": "https://api.tacc.utexas.edu/jobs/v2/6904887394479903206-242ac114-0001-007"
}
}
},
{
"id": "3676815741209931290-242ac114-0001-011",
"url": "nryan@rangers.texas.mlb.com",
"associatedUuid": "3717016635100491290-242ac114-0001-007",
"event": "FINISHED",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/notifications/v2/3676815741209931290-242ac114-0001-011"
},
"profile": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"job": {
"href": "https://api.tacc.utexas.edu/jobs/v2/3717016635100491290-242ac114-0001-007"
}
}
}
]
Unsubscribing¶
To unsubscribe from an event, perform a DELETE on the notification URL. Once deleted, you can not restore a subscription. You can, however create a new one. Keep in mind that if you do this, the UUID of the new notification will be different that that of the deleted one. An example of deleting a notification using curl as well as the CLI is given below.
Unsubscribe from a notification subscription with the following CLI command:
tapis notificaitons delete 2699130208276770330-242ac114-0001-011
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X DELETE \
https://api.tacc.utexas.edu/notifications/v2/2699130208276770330-242ac114-0001-011
A standard Tapis response with an empty result will be returned.
Retry Policies¶
In some situations, Tapis may be unable to publish a specific notification. When this happens, Tapis will immediately retry the notification 5 times in an attempt to deliver it successfully. When delivery fails for a 5th time, the notification is abandoned. If your application requires a more tenacious or methodical approach to retry delivery, you may provide a notification policy.
Example notification subscription object with custom retry policy:
{
"url" : "$REQUEST_BIN?path=${PATH}&system=${SYSTEM}&event=${EVENT}",
"event" : "*",
"persistent": true,
"policy": {
"retryStrategy": "IMMEDIATE",
"retryLimit": 20,
"retryRate": 5,
"retryDelay": 0,
"saveOnFailure": true
}
}
Name | Type | Description |
---|---|---|
retryStrategy | NONE, IMMEDIATE, DELAYED, EXPONENTIAL | The retry strategy to employ. Default is IMMEDIATE |
retryRate | int; 0:86400 | The frequency with which attempts should be made to deliver the message. |
retryLimit | int; 0:1440 | The maximum attempts that should be made to delivery the message. |
retryDelay | int; 0:86400 | The initial delay between the initial delivery attempt and the first retry. |
saveOnFailure | boolean | Whether the failed message should be persisted if unable to be delivered within the retryLimit |
Notification retry policies describe the strategy, frequency, delay, limit, and persistence to be applied when publishing an individual event for a given notification. The example above is our previous example with a notification policy included.
Failed deliveries¶
By providing a retry policy where saveOnFailure
is true, failed messages will be persisted and made available for querying at a later time. This is a great way to handled missed work due to a server failure, maintenance downtime, etc.
To query failed attempts for a specific notification, enter the following CLI command:
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://$API_BASE_URL/notifications/$API_VERSION/229681451607921126-8e1831906a8e-0001-042"/attempts
A list of notification attempts will be returned.
[
{
"id" : "229681451607921126-8e1831906a8e-0001-042",
"url" : "https://httpbin.org/status/500",
"event" : "SENT",
"associatedUuid" : "5833036796741676570-b0b0b0bb0b-0001-011",
"startTime" : "2016-06-19T22:21:02.266-05:00",
"endTime" : "2016-06-19T22:21:03.268-05:00",
"response" : {
"code" : 500,
"message" : ""
},
"_links" : {
"self" : {
"href" : "https://$API_BASE_URL/notifications/$API_VERSION/229123105859441126-8e1831906a8e-0001-011/attempts/229681451607921126-8e1831906a8e-0001-042"
},
"notification" : {
"href" : "https://$API_BASE_URL/notifications/$API_VERSION/5833036796741676570-b0b0b0bb0b-0001-011"
},
"profile" : {
"href" : "https://$API_BASE_URL/profiles/$API_VERSION/ipcservices"
}
}
}
]
Note: There is no way to save successful notification deliveries.
PostIts¶
Table of Contents
The PostIts service is a URL shortening service similar to bit.ly, goo.gl, and t.co. It allows you to create pre-authenticated, disposable URLs to any resource in the Tapis Platform. You have control over the lifetime and number of times the URL can be redeemed, and you can expire a PostIt at any time. The most common use of PostIts is to create URLs to files so that you can share with others without having to upload them to a third-party service. Anytime you need to share your science with your world, PostIts can help you.
Creating PostIts¶
To create a PostIt, send a POST request to the PostIts service with the target url you want to share. In this example, we are sharing a file we have in Tapis’s cloud storage account.
In the response you see standard fields such as created
timestamp and the postit
token. You also see several fields that lead into the discussion of another aspect of PostIts, such as the ability to restrict usage and expire them on demand.
When creating a postit, one has an option to create a postit with a specified number of allowed uses and expiration, or to create an unlimited postit. If max uses or lifetime is not provided, the default values will be applied regardless if the postit is unlimited. If postit is unlimited, these values will just act as placeholders but will not be used when redeeming.
Default parameters:
- maxUses - 1
- lifetime - 30 days
- unlimited - false
You can create a postit with either content type ‘application/json’ or ‘application/x-www-form-urlencoded’. The target URL must contain the base URL for the correct tenant. The url must also point to one of the following Tapis services: JOBS, FILES, APPS or SYSTEMS.
APPLICATION/JSON examples
Creating a postit with maxUses and lifetime:
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d '{"maxUses": 3, "lifetime": 600", "url": "<target_url>"}' -H "Content-Type: application/json" https://api.tacc.utexas.edu/postits/v2?pretty=true"
Creating unlimited postit:
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d '{"unlimited": true, "url": "<target_url>"}' -H "Content-Type: application/json" https://api.tacc.utexas.edu/postits/v2?pretty=true"
APPLICATION/X-WWW-FORM-URLENCODED examples
Creating a postit with maxUses and lifetime:
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "maxUses=3&lifetime=600&url=<target_url>"} https://api.tacc.utexas.edu/postits/v2pretty=true"
Creating unlimited postit:
$ curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST -d "unlimited=true&url=<target_url>" https://api.tacc.utexas.edu/postits/v2?pretty=true"
CLI example
(Note: CLI does not currently support unlimited postits)
tapis postits create \
-m 10 \
-L 86400 \
https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org/nryan/picksumipsum.txt
Example Postit Creation Response
{
"creator": "jstubbs"
"createdAt": "2020-09-30T21:51:31-05:00",
"expiresAt": "2020-10-01T00:14:51-05:00",
"remainingUses": 10,
"postit": "0feb1aa5-01aa-4445-b580-a008064a4c44-010",
"numberIsed": 0,
"tenantId": "tacc.prod",
"status": "ACTIVE"
"noauth": false,
"method": "GET"
"url": "https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org//home/jstubbs/picksumipsum.txt",
"method": "GET",
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010"
},
"profile":{
"href":"https://api.tacc.utexas.edu/profiles/v2/jstubbs"
},
"file":{
"href":"https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org//home/jstubbs/picksumipsum.txt"
},
"update":{
"href":"https://api.tacc.utexas.edu/postits/v2/update/0feb1aa5-01aa-4445-b580-a008064a4c44-010"
},
"list":{
"href":"https://api.tacc.utexas.edu/postits/v2/listing/0feb1aa5-01aa-4445-b580-a008064a4c44-010"
}
}
}
Available parameters to create a postit.
JSON Parameter | JSON Type | Description |
---|---|---|
maxUses | integer | The number of times a postit can be redeemed. Must be at least 1. Negative values are not allowed. |
lifetime | integer | How long the postit will live, in seconds. This number is used to generate the expiration time and date by adding the seconds to the current date and time. The resulting expiration time must be before date 1/19/2038. |
force | boolean | Appends the force argument to the curl command. |
unlimited | boolean | True to create a postit that does not have an expiration date or max uses. |
url | string | The url to be redeemed by the postit. *Always required. |
noauth | boolean | Legacy parameter that will be accepted, but ignored by the new Aloe service. |
internalUsername | string | Legacy parameter that will be accepted, but ignored by the new Aloe service. |
method | string | Legacy parameter that will be accepted, but ignored by the new Aloe service. |
warning: If you intend and using a PostIt as a link in a web page or a messaging service like Slack, HipChat, Facebook, Twitter, etc, which unfurl URL for display, then you should set the maximum uses greater than 4 due to the number of preflight requests made to the URL for display. Failing to do so will result in the URL showing up in the feed, but failing to resolve when clicked to download.
Listing PostIts¶
To list all currently active PostIts, see the following commands:
tapis postits list -v
curl -sk -H "Authorization: Bearer $AUTH_TOKEN" 'https://api.tacc.utexas.edu/postits/v2/?pretty=true'
The curl interface also allows listing postits by status. Just use ?status=<status> at the end of the URL. For example, the following curl would return all expired postits. See the table below for other status options.
curl -sk -H "Authorization: Bearer $AUTH_TOKEN"
\ 'https://api.tacc.utexas.edu/postits/v2/?pretty=true&status=expired'
Status Fields
Status | Description |
---|---|
ACTIVE | Postit is redemeemable. |
EXPIRED_AND_NO_USES | Postit is both expired and out of remaining uses. |
EXPIRED | Postit has expired. |
NO_USES | Postit is out of remaining uses. |
REVOKED | The postit has been revoked. Can no longer redeem nor update this postit. |
NOT_FOUND | (Not a status) Indicates status could not be calculated. |
ALL | (Not a status) Indicates to include all statuses. |
Listing Single PostIt¶
You can list the information for any PostIt UUID, as long as it is on the same tenant.
List a single postit
curl -H "Authorization: Bearer $AUTH_TOKEN"'https://api.tacc.utexas.edu/postits/v2/listing/0feb1aa5-01aa-4445-b580-a008064a4c44-010'
Updating PostIts¶
The creator of a postit and tenant admins can update a postit. One may update maxUses, lifetime and unlimited. If a postit transitions from unlimited to limited without maxUses and lifetime, the current expiration and remaining uses is used. When updating the lifetime, a new expiration time will be calculated based on the lifetime sent in. It does not add on to the current expiration time.
If you need to update other fields, such as url, you will need to revoke this postit and create a new one.
Update a postit from unlimited to limited, in JSON format
curl -H "Authorization: Bearer $AUTH_TOKEN"'https://api.tacc.utexas.edu/postits/v2/update/0feb1aa5-01aa-4445-b580-a008064a4c44-010' \
-X POST -d '{"maxUses": 100, "lifetime": 2000, "unlimited": false}' -H "Content-type: application/json"
Update a postit from limited to unlimited, in XML format
curl -H "Authorization: Bearer $AUTH_TOKEN"'https://api.tacc.utexas.edu/postits/v2/update/0feb1aa5-01aa-4445-b580-a008064a4c44-010' \
-X POST -d "unlimited=true"
Redeeming PostIts¶
You redeem a PostIt by making a non-authenticated HTTP request on the PostIt URL. In the above example, that would be https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010
. Every time you make a get request on the PostIt, the remainingUses
field decrements by 1 and the numberUsed
field increments by 1. This continues until the value hits 0 or the PostIt outlives its expiresAt
field. If a postit is unlimited, the remainingUses
field does not decrement, and the expiresAt
field is not used. However, the postit will retain these original values for the case that a postit is reverted to a limited postit.
cURL command for redeeming a PostIt, which would download the picksumipsum.txt
file from your storage system to the user’s machine:
curl -s -o picksumipsum.txt 'https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010'
warning: There will be no response for redeeming PostIts, even if the redemption fails.
Forcing PostIt Browser Downloads¶
If you are using PostIts in a browser environment, you can force a file download by adding force=true
to the PostIt URL query. If the target URL is a file item, the name of the file item will be included in the Content-Disposition
header so the downloaded file has the correct file name. You may also add the same query parameter to any target file item to force the Content-Disposition
header from the Files API.
Expiring PostIts¶
In addition to setting expiration parameters when you create a PostIt, you can manually expire a PostIt at any time by making an authenticated DELETE request on the PostIt URL. This will instantly expire, or revoke, the PostIt from further use. A revoked postit cannot be updated.
Manually expiring a PostIt with CLI:
tapis postits delete 0feb1aa5-01aa-4445-b580-a008064a4c44-010
curl -k -H "Authorization: Bearer $AUTH_TOKEN" -X DELETE 'https://api.tacc.utexas.edu/postits/v2/0feb1aa5-01aa-4445-b580-a008064a4c44-010?pretty=true'
Metadata v2¶
The Tapis Metadata service allows you to manage metadata and associate it with Tapis entities via associated UUIDs. It supports JSON schema for structured JSON metadata; it also accepts any valid JSON-formatted metadata or plain text String when no schema is specified. As with other Tapis services, a full access control layer is available, enabling you to keep your metadata private or share it with your colleagues.
Metadata Structure¶
Key-value metadata item
{
"name": "some metadata",
"value": "A model organism...",
}
Structured metadata item, metadata.json
{
"name":"some metadata",
"value":{
"title":"Example Metadata",
"properties":{
"species":"arabidopsis",
"description":"A model organism..."
}
}
}
Every metadata item has four fields shown in the following table.
The name
field is just that, a user-defined name you give to your metadata item. There is no uniqueness constraint put on the name
field, so it is up to you to the application to enforce whatever naming policy it sees fit.
Depending on your application needs, you may use the Metadata service as a key-value store, document store, or both. When using it as a key-value store, you provide text for the value
field. When you fetching data, you could search by exact value or full-text search as needed.
When using the Metadata service as a document store, you provide a JSON object or array for the value
field. In this use case you can leverage additional functionality such as structured queries, atomic updates, etc.
Either use case is acceptable and fully supported. Your application needs will determine the best approach for you to take.
Associations¶
Each metadata item also has an optional associationIds
field. This field contains a JSON array of Tapis UUID for which this metadata item applies. This provides a convenient grouping mechanism by which to organize logically-related resources. One common examples is creating a metadata item to represent a “data collection” and associating files and folders that may be geographically distributed under that “data collection”. Another is creating a metadata item to represent a “project”, then sharing the “project” with other users involved in the “project”.
Metadata items can also be associated with other metadata items to create hierarchical relationships. Building on the “project” example, additional metadata items could be created for “links”, “videos”, and “experiments” to hold references for categorized groups of postits, video file items, and jobs respectively. Such a model translates well to a user interface layer and eliminates a large amount of boilerplate code in your application.
information_source: | |
---|---|
The associationIds field does not carry with it any special permissions or behavior. It is simply a link between a metadata item and the resources it represents. |
Creating Metadata¶
Create a new metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
-H 'Content-Type: application/json'
--data-binary '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model organism..."}}, "name": "mustard plant"}'
https://api.tacc.utexas.edu/meta/v2/data?pretty=true
tapis meta create -v -V '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model organism..."}}, "name": "mustard plant"}'
The response will look something like the following:
- {
“uuid”: “4054837257140638186-242ac116-0001-012”, “schemaId”: null, “internalUsername”: null, “owner”: “sgopal”, “associationIds”: [], “name”: “sgopal.c41109da13893b6f.200414T000224Z”, “value”: {
- “value”: {
“title”: “Example Metadata”, “properties”: {
“species”: “arabidopsis”, “description”: “A model organism…”
}
- },
- “name”: “mustard plant”
}, “created”: “2020-04-13T19:02:24.336-05:00”, “lastUpdated”: “2020-04-13T19:02:24.336-05:00”, “_links”: {
- “self”: {
- “href”: “https://api.sd2e.org/meta/v2/data/4054837257140638186-242ac116-0001-012”
}, “permissions”: {
}, “owner”: {
“href”: “https://api.sd2e.org/profiles/v2/sgopal”}, “associationIds”: []
}
}
New Metadata are created in the repository via a POST to their collection URLs. As we mentioned before, there is no uniqueness constraint placed on metadata items. Thus, repeatedly POSTing the same metadata item to the service will create duplicate entries, each with their own unique UUID assigned by the service.
Updating Metadata¶
Update a metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
-H 'Content-Type: application/json'
--data-binary '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model plant organism..."}}, "name": "some metadata", "associationIds":["179338873096442342-242ac113-0001-002","6608339759546166810-242ac114-0001-007"]}'
https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012?pretty=true
tapis meta update -v -V '{"value": {"title": "Example Metadata", "properties": {"species": "arabidopsis", "description": "A model plant organism..."}}, "name": "some metadata", "associationIds":["179338873096442342-242ac113-0001-002","6608339759546166810-242ac114-0001-007"]}' 9057222358650121750-242ac116-0001-012
The response will look something like the following:
{
"uuid": "7341557475441971686-242ac11f-0001-012",
"schemaId": null,
"internalUsername": null,
"associationIds": [
"179338873096442342-242ac113-0001-002",
"6608339759546166810-242ac114-0001-007"
],
"lastUpdated": "2016-08-29T05:51:39.908-05:00",
"name": "some metadata",
"value": {
"title": "Example Metadata",
"properties": {
"species": "arabidopsis",
"description": "A model plant organism..."
}
},
"created": "2016-08-29T05:43:18.618-05:00",
"owner": "nryan",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012"
},
"permissions": {
"href": "https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012/pems"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"associationIds": [
{
"rel": "179338873096442342-242ac113-0001-002",
"href": "https://api.tacc.utexas.edu/files/v2/media/system/storage.example.com//",
"title": "file"
},
{
"rel": "6608339759546166810-242ac114-0001-007",
"href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007",
"title": "job"
}
]
}
}
Updating metadata is done by POSTing an updated metadata object to the existing resource. When updating, it is important to note that it is not possible to change the metadata uuid
, owner
, lastUpdated
or created
fields. Those fields are managed by the service.
Deleting Metadata¶
Delete a metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X DELETE
https://api.tacc.utexas.edu/meta/v2/data/7341557475441971686-242ac11f-0001-012?pretty=true
tapis meta delete 7341557475441971686-242ac11f-0001-012
An empty response will be returned from the service.
To delete a metadata item, simply make a DELETE request on the metadata resource.
warning: | Deleting a metadata item will permanently delete the item and all its permissions, etc. |
---|
Metadata details¶
Fetching a metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012?pretty=true
tapis meta show -v 6877878304112316906-242ac116-0001-012
The response will look something like the following:
{
"uuid": "6877878304112316906-242ac116-0001-012",
"schemaId": null,
"internalUsername": null,
"owner": "sgopal",
"associationIds": [],
"name": "sgopal.c41109da13893b6f.200414T001817Z",
"value": {
"value": {
"title": "Example Metadata",
"properties": {
"species": "arabidopsis",
"description": "A model organism..."
}
},
"name": "mustard plant"
},
"created": "2020-04-13T19:18:17.567-05:00",
"lastUpdated": "2020-04-13T19:18:17.567-05:00",
"_links": {
"self": {
"href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012"
},
"permissions": {
"href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012/pems"
},
"owner": {
"href": "https://api.sd2e.org/profiles/v2/sgopal"
},
"associationIds": []
}
}
To fetch a detailed description of a metadata item, make a GET request on the resource URL. The response will be the full metadata item representation. Two points of interest in the example response are that the response does not have an id
field. Instead, it has a uuid
field which serves as its ID. This is the result of regression support for legacy consumers and will be changed in the next major release.
The second point of interest in the response is the _links.associationIds
array in the hypermedia response. This contains an expanded representation of the associationIds
field in the body. The objects in this array are similar to the information you would recieve by calling the UUID API to resolve each of the associationIds
array values. By leveraging the information in the hypermedia response, you can save several round trips to resolve basic information about the resources the associationIds
represent.
information_source: | |
---|---|
In the event you need the entire resource representations for each associationIds value, you can simply explode the json array into a comma-separated string and call the UUID API with expand=true in the query. |
Metadata browsing¶
Listing your metadata
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
'https://api.tacc.utexas.edu/meta/v2/data?limit=1&pretty=true'
tapis meta list -v -l 1
The response will look something like the following:
[
{
"uuid": "6877878304112316906-242ac116-0001-012",
"owner": "sgopal",
"associationIds": [],
"name": "sgopal.c41109da13893b6f.200414T001817Z",
"value": {
"value": {
"title": "Example Metadata",
"properties": {
"species": "arabidopsis",
"description": "A model organism..."
}
},
"name": "mustard plant"
},
"created": "2020-04-13T19:18:17.567-05:00",
"lastUpdated": "2020-04-13T19:18:17.567-05:00",
"_links": {
"self": {
"href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012"
},
"permissions": {
"href": "https://api.sd2e.org/meta/v2/data/6877878304112316906-242ac116-0001-012/pems"
},
"owner": {
"href": "https://api.sd2e.org/profiles/v2/sgopal"
},
"associationIds": []
}
}
]
To browse your Metadata, make a GET request against the /meta/v2/data
collection. This will return all the metadata you created and to which you have been granted READ access. This includes any metadata items that have been shared with the public
or world
users. In practice, users will have many metadata items created and shared with them as part of normal use of the platform, so pagination and search become important aspects of interacting with the service.
For admins, who have implicit access to all metadata, the default listing response will be a paginated list of every metadata item in the tenant. To avoid such a scenario, admin users can append privileged=false
to bypass implicit permissions and only return the metadata queries to which they have ownership or been granted explicit access.
information_source: | |
---|---|
Admin users can append privileged=false to bypass implicit permissions and only return the metadata queries to which they have ownership or been granted explicit access. |
Metadata Validation¶
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
-H 'Content-Type: application/json'
--data-binary '{"schemaId": "4736020169528054246-242ac11f-0001-013", "value": {"title": "Example Metadata", "properties": {"description": "A model organism..."}}, "name": "some metadata"}'
https://api.tacc.utexas.edu/meta/v2/data
tapis meta update -v -V <<<'{"schemaId": "4736020169528054246-242ac11f-0001-013", "value": {"title": "Example Metadata", "properties": {"description": "A model organism..."}}, "name": "some metadata"}'
The response will look something like the following:
{
"status" : "error",
"message" : "Metadata value does not conform to schema.",
"version" : "2.1.8-r8bb7e86"
}
Often times it is necessary to validate metadata for format or simple quality control. The Metadata service is capable of validating the value
of a metadata item against a predefined JSON Schema definition. In order to leverage this feature, you must first register your JSON Schema definition with the Metadata Schemata service, then reference the UUID of that metadata schema resource in the schemaId
field.
Given our previous example metadata schema object, the following request would fail due to a missing “species” value in the metadata item value
field.
Metadata Searching¶
Searching metadata for all items with name like “mustard plant”
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
--data-urlencode '{"name": "mustard plant"}'
https://api.tacc.utexas.edu/meta/v2/data
tapis meta search --name like "mustard"
The response will look something like the following:
[
{
"uuid": "7341557475441971686-242ac11f-0001-012",
"schemaId": null,
"internalUsername": null,
"associationIds": [
"179338873096442342-242ac113-0001-002",
"6608339759546166810-242ac114-0001-007"
],
"lastUpdated": "2016-08-29T05:51:39.908-05:00",
"name": "some metadata",
"value": {
"title": "Example Metadata",
"properties": {
"species": "arabidopsis",
"description": "A model plant organism..."
}
},
"created": "2016-08-29T05:43:18.618-05:00",
"owner": "nryan",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
},
"permissions": {
"href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
},
"associationIds": [
{
"rel": "179338873096442342-242ac113-0001-002",
"href": "https://api.tacc.utexas.edu/files/v2/media/system/storage.example.com//",
"title": "file"
},
{
"rel": "6608339759546166810-242ac114-0001-007",
"href": "https://api.tacc.utexas.edu/jobs/v2/6608339759546166810-242ac114-0001-007",
"title": "job"
}
]
}
}
]
In addition to retrieving Metadata via its UUID, the Metadata service supports MongoDB query syntax. Just add the q=<value>
to URL query portion of your GET request on the metadata collection. This differs from other APIs, but provides a richer syntax to query and filter responses.
If you wanted to look up Metadata corresponding to a specific value within its JSON Metadata value, you can specify this using a JSON object such as {"name": "mustard plant"}
. Remember that, in order to send JSON in a URL query string, it must first be URL encoded. Luckily this is easily handled for us by curl
and the Tapis CLI.
The given query will return all metadata with name, “mustard plant” that you have permission to access.
Search Examples¶
metadata search by exact name
{"name": "mustard plant"}
metadata search by field in value
{"value.type": "a plant"}
metadata search for values with any field matching an item in the given array
{ "value.profile.status": { "$in": [ "active", "paused" ] } }
metadata search for items with a name matching a case-insensitive regex
{ "name": { "$regex": "^Cactus.*", "$options": "i"}}
metadata search for value by regex matched against each line of a value
{ "value.description": { "$regex": ".*monocots.*", "$options": "m"}}
metadata search for value by conditional queries
{
"$or":[
{
"value.description":{
"$regex":[
".*prickly pear.*",
".*tapis.*",
".*century.*"
],
"$options":"i"
}
},
{
"value.title":{
"$regex":".*Cactus$"
},
"value.order":{
"$regex":"Agavoideae"
}
}
]
}
Some common search syntax examples. Consult the MongoDB Query Documentation for more examples and full syntax documentation.
Metadata Permissions¶
The Metadata service supports permissions for both Metadata and Schemata consistent with that of a number of other Tapis services. If no permissions are explicitly set, only the owner of the Metadata and tenant administrators can access it.
The permissions available for Metadata and Metadata Schemata are listed in the following table. Please note that a user must have WRITE permissions to grant or revoke permissions on a metadata or schema item.
Name | Description |
---|---|
READ | User can view the resource |
WRITE | User can edit, but not view the resource |
READ_WRITE | User can manage the resource |
ALL | User can manage the resource |
NONE | User can view the resource |
information_source: | |
---|---|
You need to change the uuids and usernames to for the queries below to work. |
Listing all permissions¶
List the permissions on Metadata for a given user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true
tapis meta pems show -v\
6877878304112316906-242ac116-0001-012 sgopal
The response will look something like the following:
{
"username": "sgopal",
"permission": {
"read": true,
"write": true
},
"_links": {
"self": {
"href": "https://api.sd2e.org/meta/v2/6877878304112316906-242ac116-0001-012/pems/sgopal"
},
"parent": {
"href": "https://api.sd2e.org/meta/v2/6877878304112316906-242ac116-0001-012"
},
"profile": {
"href": "https://api.sd2e.org/meta/v2/sgopal"
}
}
}
To list all permissions for a metadata item, make a GET request on the metadata item’s permission collection
List permissions for a specific user¶
List the permissions on Metadata for a given user
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/nryan?pretty=true
tapis meta pems show -v\
6877878304112316906-242ac116-0001-012 sgopal
The response will look something like the following:
{
"username":"sgopal",
"permission":{
"read":true,
"write":true
},
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012/pems/sgopal"
},
"parent":{
"href":"https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012"
},
"profile":{
"href":"https://api.tacc.utexas.edu/meta/v2/sgopal"
}
}
}
Checking permissions for a single user is simply a matter of adding the username of the user in question to the end of the metadata permission collection.
Grant permissions¶
Grant read access to a metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
--data '{"permission":"READ"}'
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true
tapis meta pems grant -v 6877878304112316906-242ac116-0001-012 rclemens READ
Grant read and write access to a metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
--data '{"permission":"READ_WRITE"}'
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true
tapis meta pems grant -v 6877878304112316906-242ac116-0001-012 rclemens READ_WRITE
The response will look something like the following:
{
"username": "rclemens",
"permission": {
"read": true,
"write": true
},
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012/pems/rclemens"
},
"parent": {
"href": "https://api.tacc.utexas.edu/meta/v2/6877878304112316906-242ac116-0001-012"
},
"profile": {
"href": "https://api.tacc.utexas.edu/meta/v2/sgopal"
}
}
}
To grant another user read access to your metadata item, assign them READ
permission. To enable another user to update a metadata item, grant them READ_WRITE
or ALL
access.
Delete single user permissions¶
Delete permission for single user on a Metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X DELETE
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems/rclemens?pretty=true
tapis meta pems revoke 6877878304112316906-242ac116-0001-012 rclemens
An empty response will come back from the API.
Permissions may be deleted for a single user by making a DELETE request on the metadata user permission resource. This will immediately revoke all permissions to the metadata item for that user.
information_source: | |
---|---|
Please note that ownership cannot be revoked or reassigned. The user who created the metadata item will always have ownership of that item. |
Deleting all permissions¶
Delete all permissions on a Metadata item
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X DELETE
https://api.tacc.utexas.edu/meta/v2/data/6877878304112316906-242ac116-0001-012/pems?pretty=true
tapis meta pems drop 6877878304112316906-242ac116-0001-012
An empty response will be returned from the service.
Permissions may be deleted for a single user by making a DELETE request on the metadata resource permission collection.
warning: | The above operation will delete all permissions for a Metadata item, such that only the owner will be able to access it. Use with care. |
---|
Metadata Schemata¶
Schema can be provided in JSON Schema form. The service will validate that the schema is valid JSON and store it. To validate Metadata against it, the schema UUID should be given as a parameter, schemaId
, when uploading Metadata. If no schemaId` is provided, the Metadata service will accept any JSON Object or plain text string and store it accordingly. This flexible approach allows Tapis a high degree of flexibility in handling structured and unstructured metadata alike.
For more on JSON Schema please see http://json-schema.org/
information_source: | |
---|---|
The metadata service supports both JSON Schema v3 and v4. No additional work is needed on your part to specify which version you want to use, the service will autodetect the version and validate it accordingly. |
To add a metadata schema to the repository:
Creating schemata¶
Example JSON Schema document, schema.json
{
"title": "Example Schema",
"type": "object",
"properties": {
"species": {
"type": "string"
}
},
"required": [
"species"
]
}
Creating a new metadata schema
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X POST -H "Content-Type: application/json"
--data-binary '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" } },"required": ["species"] }'
https://api.tacc.utexas.edu/meta/v2/schemas/
The response will look something like the following:
{
"uuid": "4736020169528054246-242ac11f-0001-013",
"internalUsername": null,
"lastUpdated": "2016-08-29T04:52:11.474-05:00",
"schema": {
"title": "Example Schema",
"type": "object",
"properties": {
"species": {
"type": "string"
}
},
"required": [
"species"
]
},
"created": "2016-08-29T04:52:11.474-05:00",
"owner": "nryan",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
},
"permissions": {
"href": "https://papi.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
}
}
}
To create a new metadata schema that can be used to validate metadata items upon addition or updating, POST a JSON Schema document to the service.
More JSON Schema examples can be found in the Tapis Samples project.
Updating schema¶
Update a metadata schema
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" -X POST
-H 'Content-Type: application/json'
--data-binary '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" }, "description": {"type":"string"} },"required": ["species"] }'
https://api.tacc.utexas.edu/meta/v2/data/4736020169528054246-242ac11f-0001-013
tapis meta update -v <<< '{ "title": "Example Schema", "type": "object", "properties": { "species": { "type": "string" }, "description": {"type":"string"} },"required": ["species"] }' 4736020169528054246-242ac11f-0001-013
The response will look something like the following:
{
"uuid": "4736020169528054246-242ac11f-0001-013",
"internalUsername": null,
"lastUpdated": "2016-08-29T04:52:11.474-05:00",
"schema": {
"title": "Example Schema",
"type": "object",
"properties": {
"species": {
"type": "string"
}
},
"required": [
"species"
]
},
"created": "2016-08-29T04:52:11.474-05:00",
"owner": "nryan",
"_links": {
"self": {
"href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013"
},
"permissions": {
"href": "https://api.tacc.utexas.edu/meta/v2/schemas/4736020169528054246-242ac11f-0001-013/pems"
},
"owner": {
"href": "https://api.tacc.utexas.edu/profiles/v2/nryan"
}
}
}
Updating metadata schema is done by POSTing an updated schema object to the existing resource. When updating, it is important to note that it is not possible to change the schema uuid
, owner
, lastUpdated
or created
fields. Those fields are managed by the service.
Deleting schema¶
Delete a metadata schema
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X DELETE
https://api.tacc.utexas.edu/meta/v2/data/4736020169528054246-242ac11f-0001-013
An empty response will be returned from the service.
To delete a metadata schema, simply make a DELETE request on the metadata schema resource.
warning: | Deleting a metadata schema will permanently delete the schema and all its history, permissions, etc. Once the schema is deleted, the remaining metadata items will not be automatically updated, thus it is important to know that updates to metadata items that still reference the schema will fail. |
---|
Specifying schemata as $ref¶
When building new JSON Schema definitions, it is often helpful to break each object out into its own definition and use $ref
fields to reference them. The metadata service supports such references between metadata schema resources. Simply provide the fully qualified URL of another valid metadata schema resources as the value to a $ref
field and Tapis will resolve the reference internally, applying the appropriate authentication and authorization for the requesting user to the request to the referenced resource.
warning: | When using Tapis Metadata Schema as external references in a JSON Schema definition, make sure you grant at READ permission or greater to every referenced Tapis Metadata Schema resource needed to resolved the JSON Schema definition. |
---|
Metadata v3¶
Meta V3 is a REST API Microservice for MongoDB which provides server-side Data, Identity and Access Management for Web and Mobile applications.
Overview¶
Meta V3 is:
A Stateless Microservice. With Meta V3 projects can focus on building Angular or other frontend applications, because most of the server-side logic necessary to manage database operations, authentication / authorization and related APIs is automatically handled, without the need to write any server-side code except for the UX/UI.
For example, to insert data into MongoDB a developer has to just create client-side JSON documents and then execute POST operations via HTTP to Meta V3. Other functions of a modern MongoDB installation like flexible schema, geoJson and aggregation pipelines ease the development process.
Every tenant will have access to at least one database where they can store and manage json documents. Documents are the trailing end of a nested hierarchy of data that begins with a database that houses one or more collections. The collections house json documents the structure of which is left up to the administrators of the tenant database.
Permissions for access to databases, collections and documents must be predefined before accessing those resources. The definitions for access are defined within the Security Kernel API of Tapis V3 and must be added by a tenant or service administrator. See the Permissions section below for some examples of permissions definitions and access to resources in the Meta V3 API.
Getting Started¶
Create a document¶
We have a database named MyTstDB and a collection name MyCollection. To add a json document to MyCollection, we can do the following:
With CURL:
$ curl -v -X POST -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" --data '{"name": "test document slt 7.21.2020-14:27","jimmyList": ["1","3"],"description": "new whatever",}' $BASE_URL/v3/meta/MyTstDB/MyCollection?basic=true
The response will have an empty response body with a status code of 201 “Created” unless the “basic” url query parameter is set to true. Setting the “basic” parameter to true will give a Tapis Basic response along with the “_id” of the newly created document. A more detailed discussion of autogenerated ids and specified ids can be found in the “Create Document” section of “Document Resources”.
{
"result": {
"_id": "5f189316e37f7b5a692285f3"
},
"status": "201",
"message": "Created",
"version": "0.0.1"
}
List documents¶
Using our MyTstDb/MyCollection resources we can ask for a default list of documents in MongoDB default sorted order. The document we created earlier should be listed with a new “_id” field that was autogenerated by MongoDB.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/MyTstDB/MyCollection
The response will be an array of json documents from MyCollection :
[
{
"_id": {
"$oid": "5f189316e37f7b5a692285f3"
},
"name": "test document slt 7.21.2020-14:27",
"jimmyList": [
"1",
"3"
],
"description": "new whatever",
"_etag": {
"$oid": "5f189316296c81742a6a3e4c"
}
},
{
"_id": {
"$oid": "5f1892ece37f7b5a692285e9"
},
"name": "test document slt 7.21.2020-14:25",
"jimmyList": [
"1",
"3"
],
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e4b"
}
}
]
Get a document¶
If we know the “_id” of a created document, we can ask for it directly.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/MyTstDB/MyCollection/5f1892ece37f7b5a692285e9
The response will be a json document from MyCollection with the “_id” of 5f1892ece37f7b5a692285e9 :
{
"_id": {
"$oid": "5f1892ece37f7b5a692285e9"
},
"name": "test document slt 7.21.2020-14:25",
"jimmyList": [
"1",
"3"
],
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e4b"
}
}
Find a document¶
We can pass a query parameter named “filter” and set the value to a json MongoDB query document. Let’s find a document by a specific “name”.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" --data-urlencode filter='{"name": "test document slt 7.21.2020-14:25"}' $BASE_URL/v3/meta/MyTstDB/MyCollection
The response will be an array of json documents from MyCollection :
[
{
"_id": {
"$oid": "5f1892ece37f7b5a692285e9"
},
"name": "test document slt 7.21.2020-14:25",
"jimmyList": [
"1",
"3"
],
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e4b"
}
}
]
Resources¶
General resources¶
An unauthenticated Health check is in included in the Meta V3 API to let any user know the current condition of the service.
Health Check
An unauthenticated request for the health status of Meta V3 API.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" $BASE_URL/v3/meta/
The response will be a Basic Tapis response on health:
{
"result": "",
"status": "200",
"message": "OK",
"version": "0.0.1"
}
Root resources¶
The Root resource space represents the root namespace for databases on the MongoDb host. All databases are located here. Requests to this space are limited to READ only for tenant administrators.
List DB Names
A request to the Root resource will list Database names found on the server. This request has been limited to those users with tenant administrative roles.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/
The response will a json list of database names:
[
"StreamsDevDB",
"v1airr"
]
Database resources¶
The Database resource is the top level for many tenant projects. The resource maps directly to a MongoDb named database in the database server. Case matters for matching the name of the database and must be specified when making requests for collections or documents. Currently
List Collection Names
This request will return a list of collection names from the specified database {db}. The permissions for access to the database are set prior to access.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}
Here is an example response:
[
"streams_alerts_metadata",
"streams_channel_metadata",
"streams_instrument_index",
"streams_project_metadata",
"streams_templates_metadata",
"tapisKapa-local"
]
Get DB Metadata
This request will return the metadata properties associated with the database. The core server generates an etag in the _properties collection for a database that is necessary for future deletion.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}/_meta
Here is an example response:
{
"_id": "_meta",
"_etag": { "$oid": "5ef6232b296c81742a6a3e02" }
}
Create DB
TODO: this implementation is not exposed. Creation of a database by tenant administrators is scheduled for inclusion in an administrative interface API in a future release.
This request will create a new named database in the MongoDb root space by a tenant or service administrator.
With CURL:
$ curl -v -X PUT -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}
Here is an example response:
{ }
Delete a DB TODO: this implementation is not exposed. Deletion of a database by tenant administrators is scheduled for inclusion in an administrative interface API in a future release.
This request will delete a named database in the MongoDb root space by a tenant or service administrator.
With CURL:
$ curl -v -X DELETE -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}
Here is an example response:
{ }
Collection Resources¶
The Collection resource allows requests for managing and querying json documents within a MongoDB collection.
Create a Collection
You can create a new collection of documents by specifying a collection name under a specific database. /v3/meta/{db}/{collection}
With CURL:
$ curl -v -X PUT -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}
Here is an example response:
Empty response with HTTP status of 201
List Documents
A default number of documents found in the collection are returned in an array of documents.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}/{collection}
The response will look like the following:
[
{
"_id": {
"$oid": "5f1892ece37f7b5a692285e9"
},
"name": "test document slt 7.21.2020-14:25",
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e4b"
}
},
{
"_id": {
"$oid": "5f1892ece37f7b5a69228533"
},
"name": "test document slt 7.21.2020-14:25",
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e444"
}
}
]
List Documents Large Query
A default number of documents found in the collection are returned in an array of documents.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d @FILENAME '' $BASE_URL/v3/meta/{db}/{collection}/_filter
The response will look like the following:
[
{
"_id": {
"$oid": "5f1892ece37f7b5a692285e9"
},
"name": "test document slt 7.21.2020-14:25",
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e4b"
}
},
{
"_id": {
"$oid": "5f1892ece37f7b5a69228533"
},
"name": "test document slt 7.21.2020-14:25",
"description": "new whatever",
"_etag": {
"$oid": "5f1892ec296c81742a6a3e444"
}
}
]
Delete a Collection
This administrative method is only available to tenant or meta administrators and requires an If-Match header parameter of the Etag for the collection. The Etag value, if not already known, can be retrieved from the “_meta” call for a collection.
With CURL:
$ curl -v -X DELETE -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}
Here is an example response:
Empty response body with status code 204
Get Collection Size
You can find the given size or number of documents in a given collection by calling “_size” on a collection.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_size
Here is an example response:
TODO
Get Collection Metadata
You can find the metadata properties of a given collection by calling “_meta” on a collection. This would include the Etag value for a collection that is needed for deletion.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_meta
Here is an example response:
{
"_id": "_meta",
"_etag": {
"$oid": "5f2b2b7a204ce7637579c85f"
}
}
Document Resources¶
Document resources are json documents found in a collection. Reading, creating, deleting and updating documents along with batch processing make up the operations that can be applied to documents in a collection. There various ways to retrieve one or more documents from a collection, including using a filter query parameter and value in the form of a MongoDB query document. Batch addition of documents, as well as, batch updates based on queries is also allowed.
Create a Document
Creating a new document within a collection. Submitting a json document within the request body of a POST request will create a new document within the specified collection with a MongoDB autogenerated “_id”. Batch document addition is possible by POSTing an array of new documents with a request body for the specified collection. The rules for “_id” creation operates the same way on multiple documents as they do with a single document.
The default representation returned is an empty response body along with a 201 Http status code “Created”. However if an additional query parameter named “basic” is added with the value of “true” a basic Tapis response is returned along with the newly created “_id” of the document.
With CURL:
$ curl -v -X POST -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '{"docName":"test doc"}' $BASE_URL/v3/meta/{db}/{collection}
Here is an example response:
Empty response
Multiple documents can be added to a collection by POSTing a json array of documents. The batch addition of documents only supports the default response.
With CURL:
$ curl -v -X POST -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '[{"docName":"test doc1"},{"docName":"test doc2"}]' $BASE_URL/v3/meta/{db}/{collection}
The response body will be empty:
Get a Document
Get a specific document by its “_id”.
With CURL:
$ curl -v -X GET -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/{document_id}
The response will be the standard json response:
{
"_id"}
Replace a Document
This call replaces an existing document identified by document id (“_id”), with the json supplied in the request body.
With CURL:
$ curl -v -X PUT -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '{"docName":"test doc another one"}' $BASE_URL/v3/meta/{db}/{collection}/{document_id}
Here is an example response:
TODO
Modify a Document
This call will replace a portion of a document identified by document id (“_id”) with the supplied json.
With CURL:
$ curl -v -X PATCH -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '{"docName":"test changed"}' $BASE_URL/v3/meta/{db}/{collection}/{document_id}
Here is an example response:
TODO
Delete Document
Deleting a document with a specific document id (“_id”), removes it from the collection.
With CURL:
$ curl -v -X DELETE -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/{db}/{collection}/{document_id}
Here is an example response:
TODO
Index Resources¶
Indexes can help speed up queries of your collection and the API gives you the ability to define and manage your indexes. You can create an index for a collection, list indexes for a collection and delete an index. Indexes can’t be updated they must be deleted and recreated.
List Indexes
List the indexes defined for a collection.
With CURL:
$ curl -v -X POST -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_indexes
Here is an example response:
TODO
Create Index
Create a new Index with a new name. To create an index you have to specify the keys and the index options. Let’s create an unique, sparse index on property qty and name our index “qtyIndex”.
PUT /v3/meta/{db}/{collection}/_indexes/qtyIndex
{"keys": {"qty": 1},"ops": {"unique": true, "sparse": true }}
With CURL:
$ curl -v -X PUT -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '{ "keys": <keys>, "ops": <options> }' $BASE_URL/v3/meta/{db}/{collection}/_indexes/{indexName}
Here is an example response:
TODO
Delete Index
Remove a named Index from the index list.
With CURL:
$ curl -v -X DELETE -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" $BASE_URL/v3/meta/{db}/{collection}/_indexes/{indexName}
Here is an example response:
TODO
Aggregation Resources¶
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. Aggregations in the API are predefined and added to a collections properties. They may also be parameterized for use with multiple sets of inputs.
Create an Aggregation
Create an aggregation pipeline by adding the aggregation to the collection for future execution. The aggregation may have variables that are defined so that a future request may pass variable values for aggregation execution. See “Execute an Aggregation”.
{ "aggrs" : [
{ "stages" : [ { "$match" : { "name" : { "$var" : "n" } } },
{ "$group" : { "_id" : "$name",
"avg_age" : { "$avg" : "$age" }
} }
],
"type" : "pipeline",
"uri" : "example-pipeline"
}
]
}
Property | Mandatory | Description |
---|---|---|
type | yes |
|
uri | yes |
|
stages | yes |
|
For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/
With CURL:
$ curl -v -X PUT -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt"
-d '{ "aggrs" : [{ "stages" : [ { "$match" : { "name" : { "$var" : "n" } } },{ "$group" : { "_id" : "$name","avg_age" : { "$avg" : "$age" }} } ],
"type" : "pipeline","uri" : "example-pipeline"}]}' $BASE_URL/v3/meta/{db}/{collection}
Here is an example response:
TODO
Execute an Aggregation
TODO
With CURL:
$ curl -v -X POST -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/
Here is an example response:
TODO
Delete an Aggregation
TODO
With CURL:
$ curl -v -X POST -H "Content-Type:application/json" -H "X-Tapis-Token:$jwt" -d '' $BASE_URL/v3/meta/
Here is an example response:
TODO
User Profiles¶
The Tapis hosted identity service (profiles service) is a RESTful web service that gives organizations a way to create and manage the user accounts within their Tapis tenant. The service is backed by a redundant LDAP instance hosted in multiple datacenters making it highly available. Additionally, passwords are stored using the openldap md5crypt algorithm.
Tenant administrators can manage only a basic set of fields on each user account within LDAP itself. For more complex profiles, we recommend combing the profiles service with the metadata service. See the section on Extending the Basic Profile with the Metadata Service below.
The service uses OAuth2 for authentication, and user’s must have special privileges to create and update user accounts within the tenant. Please work with the Tapis development team to make sure your admins have the user-account-manager role.
In addition to the web service, there is also a basic front-end web application providing user sign up. The web application will suffice for basic user profiles and can be used as a starting point for more advanced use cases.
This service should NOT be used for authenticating users. For details on using OAuth for authentication, see the Authorization Guide
Creating¶
Create a user account with the following CLI command:
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X POST \
-d "username=testuser" \
-d "password=abcd123" \
-d "email=testuser@test.com" \
https://api.tacc.utexas.edu/profiles/v2
{
"message":"User created successfully.",
"result":{
"email":"testuser@test.com",
"first_name":"",
"full_name":"testuser",
"last_name":"testuser",
"mobile_phone":"",
"phone":"",
"status":"Active",
"uid":null,
"username":"testuser"
},
"status":"success",
"version":"2.0.0-SNAPSHOT-rc3fad"
}
Create a user account by sending a POST request to the profiles service, providing an access token of a user with the user-account-manager role. The fields username, password and email are required to create a new user.
Creating and managing accounts requires a special **user-account-manager* role. As a best practice, we recommend setting up a separate, dedicated, account to handle user management. Please work with the Tapis developer team if this is of interest to your organization.*
The complete list of available fields and their descriptions is provided in the table below.
Field Name | Description | Required? |
---|---|---|
username | The username for the user; must be unique across the tenant | Yes |
The email address for the user | Yes | |
password | The password for the user | Yes |
first_name | First name of the user | No |
last_name | Last name of the user | No |
phone | User’s phone number | No |
mobile_phone | User’s mobile phone number | No |
Note that the service does not do any password strength enforcement or other password management policies. We leave it to each organization to implement the policies best suited for their use case.
Extending with Metadata¶
Here is an example metadata object for extending a user profile:
{
"name":"user_profile",
"value":{
"firstName":"Test",
"lastName":"User",
"email":"testuser@test.com",
"city":"Springfield",
"state":"IL",
"country":"USA",
"phone":"636-555-3226",
"gravatar":"http://www.gravatar.com/avatar/ed53e691ee322e24d8cc843fff68ebc6"
}
}
Save the extended profile document to the metadata service with the following CLI command:
tapis metadata update -v -F profile_example.json
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X POST \
-F "fileToUpload=@profile_ex" \
https://api.tacc.utexas.edu/meta/v2/data/?pretty=true
{
"status" : "success",
"message" : null,
"version" : "2.1.0-rc0c5a",
"result" : {
"uuid" : "0001429724043699-5056a550b8-0001-012",
"owner" : "jstubbs",
"schemaId" : null,
"internalUsername" : null,
"associationIds" : [ ],
"lastUpdated" : "2015-04-22T12:34:03.698-05:00",
"name" : "user_profile",
"value" : {
"firstName" : "Test",
"lastName" : "User",
"email" : "testuser@test.com",
"city" : "Springfield",
"state" : "IL",
"country" : "USA",
"phone" : "636-555-3226",
"gravatar" : "http://www.gravatar.com/avatar/ed53e691ee322e24d8cc843fff68ebc6"
},
"created" : "2015-04-22T12:34:03.698-05:00",
"_links" : {
"self" : {
"href" : "https://api.tacc.utexas.edu/meta/v2/data/0001429724043699-5056a550b8-0001-012"
}
}
}
}
We do not expect the fields above to provide full support for anything but the most basic profiles. The recommended strategy is to use the profiles service in combination with the metadata service (see Metadata Guide) to store additional information. The metadata service allows you to create custom types using JSON schema, making it more flexible than standard LDAP from within a self-service model. Additionally, the metadata service includes a rich query interface for retrieving users based on arbitrary JSON queries.
The general approach used by existing tenants has been to create a single entry per user
where the entry contains all additional profile data for the user. Every metadata item
representing a user profile can be identified using a fixed string for the name
attribute (e.g., user_profile). The value of the metadata item contains a unique
identifier for the user (e.g. username or email address) along with all the additional
fields you wish to track on the profile. One benefit of this approach is that it cleanly
delineates multiple classes of profiles, for example admin_profile, developer_profile,
mathematician_profile, etc. When consuming this information in a web interface, such
user-type grouping makes presentation significantly easier.
Another issue to consider when extending user profile information through the Metadata service is ownership. If you create the user’s account, then prompt them to login before entering their extended data, it is possible to create the user’s metadata record under their account. This has the advantage of giving the user full ownership over the information, however it also opens up the possibility that the user, or a third-party application, could modify or delete the record.
A better approach is to use a service account to create all extended profile metadata records and grant the user READ access on the record. This still allows third-party applications to access the user’s information at their request, but prevents any malicious things from happening.
For even quicker access, you can associate the metadata record with the UUID of the user through the associationIds attribute. See the `Metadata Guide <../metadata/introduction.md>`_ for more information about efficient storing and searching of metadata.
Updating¶
Update a user profile with the following CLI command:
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
-X PUT
-d "password=abcd123&email=testuser@test.com&first_name=Test&last_name=User" \
https://api.tacc.utexas.edu/profiles/v2/testuser
{
"message":"User updated successfully.",
"result":{
"create_time":"20150421153504Z",
"email":"testuser@test.com",
"first_name":"Test",
"full_name":"Test User",
"last_name":"User",
"mobile_phone":"",
"phone":"",
"status":"Active",
"uid":0,
"username":"testuser"
},
"status":"success",
"version":"2.0.0-SNAPSHOT-rc3fad"
}
Updates to existing users can be made by sending a PUT request to
https://api.tacc.utexas.edu/profiles/v2/ and passing the fields to update.
For example, we can add a gravatar
attribute to the account we created above.
Deleting¶
Delete a user profile with the following CLI command:
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN"
-X DELETE https://api.tacc.utexas.edu/profiles/v2/testuser
{
"message": "User deleted successfully.",
"result": {},
"status": "success",
"version": "2.0.0-SNAPSHOT-rc3fad"
}
To delete an existing user, make a DELETE request on their profile resource.
Deleting a user is a destructive action and cannot be undone. Consider the implications of user deletion and the impact on their existing metadata before doing so.
Registration Web Application¶
The account creation web app provides a simple form to enable user self-sign.

The web application also provides an email loop for verification of new accounts. The code is open source and freely available from bitbucket: Account Creation Web Application
Most likely you will want to customize the branding and other aspects of the application, but for simple use cases, the Tapis team can deploy a stock instance of the application in your tenant. Work with the Tapis developer team if this is of interest to your organization.
UUID¶
/$$ /$$ /$$ /$$ /$$$$$$ /$$$$$$$
| $$ | $$| $$ | $$|_ $$_/| $$__ $$
| $$ | $$| $$ | $$ | $$ | $$ $$
| $$ | $$| $$ | $$ | $$ | $$ | $$
| $$ | $$| $$ | $$ | $$ | $$ | $$
| $$ | $$| $$ | $$ | $$ | $$ | $$
| $$$$$$/| $$$$$$/ /$$$$$$| $$$$$$$/
\______/ \______/ |______/|_______/
The Tapis UUID service resolves the type and representation of one or more Tapis UUID. This is helpful, for instance, when you need to expand the hypermedia response of another resource, get the URL corresponding to a UUID, or fetch the representations of multiple resources in a single request.
Resolving a single UUID¶
Resolving a uuid
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/uuid/v2/0001409758089943-5056a550b8-0001-002
The response will look something like this:
{
"uuid":"0001409758089943-5056a550b8-0001-002",
"type":"FILE",
"_links":{
"file":{
"href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
}
}
}
A single UUID can be resolved by making a GET request on the UUID resource. The response will include the UUID and the type of the resource to which it is associated. The canonical resource URL is available in the hypermedia response. All calls to the UUID API are authenticated, however no permission checks will be made when doing basic resolving.
Expanding a UUID query¶
Resolving a uuid to a full resource representation
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/uuid/v2/0001409758089943-5056a550b8-0001-002?expand=true&pretty=true
The response will include the entire representation of the resource just as if you queried the Files API.
{
"internalUsername":null,
"lastModified":"2014-09-03T10:28:09.943-05:00",
"name":"picksumipsum.txt",
"nativeFormat":"raw",
"owner":"nryan",
"path":"/home/nryan/picksumipsum.txt",
"source":"http://127.0.0.1/picksumipsum.txt",
"status":"STAGING_QUEUED",
"systemId":"data.iplantcollaborative.org",
"uuid":"0001409758089943-5056a550b8-0001-002",
"_links":{
"history":{
"href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
},
"self":{
"href":"https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
},
"system":{
"href":"https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
}
}
}
Often times you need more information about the resource associated with the UUID. You can save yourself an API request by adding expand=true
to the URL query. The resulting response, if successful, will include the full resource representation of the resource associated with the UUID just as if you had called its URL directly. Filtering is also supported, so you can specify just the fields you want returned in the response.
Resolving multiple UUID¶
Resolving multiple UUID.
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/uuid/v2/?uuids.eq=0001409758089943-5056a550b8-0001-002,0001414144065563-5056a550b8-0001-007?expand=true&pretty=true
The response will be similar to the following.
[
{
"uuid":"0001409758089943-5056a550b8-0001-002",
"type":"FILE",
"url":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt",
"_links":{
"file":{
"href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
}
}
},
{
"uuid":"0001414144065563-5056a550b8-0001-007",
"type":"JOB",
"url":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007",
"_links":{
"file":{
"href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
}
}
}
]
To resolve multiple UUID, make a GET request on the uuids collection and pass the UUID in as a comma-separated list to the uuids
query parameter. The response will contain a list of resolved resources in the same order that you requested them.
Expanding multiple UUID¶
Resolving multiple UUID to their resource representations
curl -sk -H "Authorization: Bearer $ACCESS_TOKEN" \
https://api.tacc.utexas.edu/uuid/v2/?uuids.eq=0001409758089943-5056a550b8-0001-002,0001414144065563-5056a550b8-0001-007?expand=true&pretty=true
The response will include an array of the expanded representations in the order they were requested in the URL query.
[
{
"id":"$JOB_ID",
"name":"demo-pyplot-demo-advanced test-1414139896",
"owner":"$API_USERNAME",
"appId":"demo-pyplot-demo-advanced-0.1.0",
"executionSystem":"$PUBLIC_EXECUTION_SYSTEM",
"batchQueue":"debug",
"nodeCount":1,
"processorsPerNode":1,
"memoryPerNode":1.0,
"maxRunTime":"01:00:00",
"archive":false,
"retries":0,
"localId":"10321",
"outputPath":null,
"status":"STOPPED",
"submitTime":"2014-10-24T04:48:11.000-05:00",
"startTime":"2014-10-24T04:48:08.000-05:00",
"endTime":null,
"inputs":{
"dataset":"agave://$PUBLIC_STORAGE_SYSTEM/$API_USERNAME/inputs/pyplot/testdata.csv"
},
"parameters":{
"chartType":"bar",
"height":"512",
"showLegend":"false",
"xlabel":"Time",
"background":"#FFF",
"width":"1024",
"showXLabel":"true",
"separateCharts":"false",
"unpackInputs":"false",
"ylabel":"Magnitude",
"showYLabel":"true"
},
"_links":{
"self":{
"href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007"
},
"app":{
"href":"https://api.tacc.utexas.edu/apps/v2/demo-pyplot-demo-advanced-0.1.0"
},
"executionSystem":{
"href":"https://api.tacc.utexas.edu/systems/v2/$PUBLIC_EXECUTION_SYSTEM"
},
"archiveData":{
"href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/outputs/listings"
},
"owner":{
"href":"https://api.tacc.utexas.edu/profiles/v2/$API_USERNAME"
},
"permissions":{
"href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/pems"
},
"history":{
"href":"https://api.tacc.utexas.edu/jobs/v2/0001414144065563-5056a550b8-0001-007/history"
},
"metadata":{
"href":"https://api.tacc.utexas.edu/meta/v2/data/?q=%7b%22associationIds%22%3a%220001414144065563-5056a550b8-0001-007%22%7d"
},
"notifications":{
"href":"https://api.tacc.utexas.edu/notifications/v2/?associatedUuid=0001414144065563-5056a550b8-0001-007"
}
}
},
{
"internalUsername":null,
"lastModified":"2014-09-03T10:28:09.943-05:00",
"name":"picksumipsum.txt",
"nativeFormat":"raw",
"owner":"nryan",
"path":"/home/nryan/picksumipsum.txt",
"source":"http://127.0.0.1/picksumipsum.txt",
"status":"STAGING_QUEUED",
"systemId":"data.iplantcollaborative.org",
"uuid":"0001409758089943-5056a550b8-0001-002",
"_links":{
"history":{
"href":"https://api.tacc.utexas.edu/files/v2/history/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
},
"self":{
"href":"https://api.tacc.utexas.edu/files/v2/media/system/data.iplantcollaborative.org/nryan/picksumipsum.txt"
},
"system":{
"href":"https://api.tacc.utexas.edu/systems/v2/data.iplantcollaborative.org"
}
}
}
]
Expansion also works when querying UUID in bulk. Simply add expand=true
to the URL query in your request and the full resource representation of each UUID will be returned in an array with the original UUID request order maintained. If any of the resolutions fail due to permission violation or server error, the error response object will be provided rather than resource representation.
Events¶
/$$$$$$$$ /$$
| $$_____/ | $$
| $$ /$$ /$$/$$$$$$ /$$$$$$$ /$$$$$$ /$$$$$$$
| $$$$| $$ /$$/$$__ $| $$__ $|_ $$_/ /$$_____/
| $$__/ $$/$$| $$$$$$$| $$ $$ | $$ | $$$$$$
| $$ $$$/| $$_____| $$ | $$ | $$ /$\____ $$
| $$$$$$$ $/ | $$$$$$| $$ | $$ | $$$$/$$$$$$$/
|________/\_/ \_______|__/ |__/ \___/|_______/
Events underpin everything in the Tapis Platform. This section covers the events available to each resource.
Search¶
Search is a fundamental feature of the Tapis Platform. Most of the core science APIs support a mature, URL-based query mechanism allowing you to search using a SQL-inspired JSON syntax. The two exceptions are the Files and Metadata APIs. The Files service does not index the directory or file contents of registered systems, so there is no way for it to performantly search the file system. The Metadata service supports MongoDB query syntax, thus allowing more flexible, and slightly more complex, querying syntax.
Query syntax¶
http://api.tacc.utexas.edu/jobs/v2?name=test%20job
You can include as multiple search expressions to build a more restrictive query.
http://api.tacc.utexas.edu/jobs/v2?name=test%20job&executionSystem=aws-demo&status=FAILED
By default, search is enabled on each collection endpoint allowing you to trim the response down to the results you care about most. The list of available search terms is identical to the attributes included in the JSON returned when requesting the full resource description.
To search for a specific attribute, you simply append a search expression into the URL query of your request. For example:
Search operators¶
# systems with cloud in their name
systems/v2?name.like=*cloud*
# jobs with status equal to PENDING or ARCHIVING
jobs/v2?status.in=PENDING,ARCHIVING
# systems with cloud in their name
tapis systems search --name like '*cloud*'
# jobs with status equal to PENDING or ARCHIVING
tapis jobs search --status eq 'PENDING'
By default, all search expressions are evaluated for equality. In order to perform more complex queries, you may append a search operator to the attribute in your search expression. The following examples should help clarify:
For resources with nested collections, you may use JSON dot notation to query the subresources in the collection.
# systems using Amazon S3 as the storage protocol
systems/v2?storage.protocol.eq=S3
# systems with a batch queue allowing more than 10 concurrent user jobs
systems/v2?queues.maxUserJobs.gt=10
# systems using Amazon S3 as the storage protocol
systems-search 'storage.protocol.eq=S3'
# systems with a batch queue allowing more than 10 concurrent user jobs
systems-search 'queues.maxUserJobs.gt=10'
Multiple operators¶
# jobs whose app has hadoop in the name, ran on an execution system with id aws-demo, and status is equal to FINISHED
jobs/v2?appId.like=*hadoop*&executionSystem.eq=aws-demo&status.eq=FINISHED
# jobs whose app has hadoop in the name, ran on an execution system with id aws-demo, and status is equal to FINISHED
tapis jobs search --app-id like 'hadoop' --system-id eq 'aws-demo' --status eq 'FINISHED'
As before you can include multiple search expressions to narrow your results.
The full list of search operators is given in the following table.
Operator | Values | Description |
---|---|---|
eq | mixed | Matches values equal to the given search value. All comparisons are case sensitive. This cannot be used for complex object comparison. |
neq | mixed | Matches values not equal to the given search value. All comparisons are case sensitive. This cannot be used for complex object comparison. |
lt | mixed | Matches values less than the given search value. |
lte | mixed | Matches values less than or equal to the given search value. |
gt | mixed | Matches values greater than the given search value. |
gte | mixed | Matches values greater than or equal to the given search value. |
in | comma-separated list | Matches values in the given comma-separated list. This is equivalent to applying the like operator to each comma-separated value. |
nin | comma-separated list | Matches values not in the given comma-separated list. This is equivalent to applying the nlike operator to each comma-separated value. |
like | string | Matches values similar to the given search term. Wildcards (^) may be used to perform partial matches. |
nlike | string | Matches values different to the given search term. Wildcards (^) may be used to perform partial matches. |
Custom search result¶
jobs/v2?appId.like=cloud&executionSystem.like=docker&naked=true&limit=3
There response will be a JSON array of custom objects comprised of only the fields you specified in thefilter
query parameter.
[
{
"id":"2974032102330798566-242ac115-0001-007",
"appId":"cloud-runner-0.1.0u1",
"executionSystem":"docker.tacc.utexas.edu",
"status":"FINISHED",
"created":"2016-11-03T16:04:53.000-05:00"
},
{
"id":"8643408718823550490-242ac115-0001-007",
"appId":"cloud-runner-0.1.0u1",
"executionSystem":"docker.tacc.utexas.edu",
"status":"FINISHED",
"created":"2016-11-03T15:17:24.000-05:00"
},
{
"id":"9049010248689521126-242ac115-0001-007",
"appId":"cloud-runner-0.1.0u1",
"executionSystem":"docker.tacc.utexas.edu",
"status":"FINISHED",
"created":"2016-11-03T15:17:07.000-05:00"
}
]
By combining the search, filtering, and naked
query parameters, you can query the API and return just the information you care about. The example search will return a JSON array of job objects with just the id
, appId
, executionSystem
, status
, and created
fields from the full job object in the response. This combination of search, filtering, and pagination provides a powerful mechanism for generating custom views of the data.
Tooling¶
Sometimes the hardest part of a new project is taking the first step. Tapis Tooling helps make taking that first step a little easier through reference web applications, boilerplate integrations scripts, and integrations with popular CMS and frameworks through native plugins and modules.
Jupyter Hub¶
Jupyter notebooks (formerly iPython notebooks) provide users with interactive computing documents that contain both computer code and a mix of rich text elements such as data visualizations, text paragraphs, hyperlinks, formatted equations, etc. The code cells in notebooks can be executed interactively, cell by cell, and the results of the executions are displayed in subsequent cells in the notebook. The notebooks can also be exported to a serialized JSON formatted file and executed like a traditional program.
JupyterHub is an open source project to provide multi-user hosted notebook servers as a service. When a user signs in to JupyterHub, a notebook server with pre-configured software is automatically launched for them. The Tapis team integrated JupyterHub into its identity and access management stack and made several other additional enhancements and customizations to enable the use of Tapis’ (Tapis) language SDKs such as agavepy and the CLI, persistent storage, and multiple kernel support, directly from their notebooks with very minimal setup. Tapis’ (Tapis) deployment of JupyterHub, which runs each user’s notebook server in a Docker container to further enhance reproducibility, is freely available for use in Tapis’ (Tapis) Public Tenant.
You can get started with JupyterHub today at https://jupyter.tacc.cloud.
Command Line Interface¶
The Tapis command-line interface (CLI) is an complete interface to the Tapis REST API. The scripts include support for creating persistent authentication sessions, creating/renaming apps, registering and sharing systems, uploading and managing data, creating PostIts, etc. For existing projects looking to leverage Tapis for back-end processing, for users wishing to integrate Tapis into their existing scripted solutions, or for those new to Tapis who just want to kick the tires, the Tapis CLI is a powerful tool for all of these things. The Tapis CLI can be checked out from the Tapis git repository.
git clone https://github.com/TACC-Cloud/tapis-cli.git
For more information on using the Tapis CLI in common tasks, please consult the Tutorials Section which reference it in all their examples, or check out the Tapis Samples project for sample data and examples of how to use it to populate and interact with your tenant. You can also check out the Tapis CLI Documentation.
Indices and tables¶
- Versions
- latest
- Downloads
- html
- On Read the Docs
- Project Home
- Builds
Free document hosting provided by Read the Docs.