django-memento-framework

A set for helpers for Django web sites to enable the Memento framework for time-based access.

How does it work?

Django’s class-based views are used to retrieve archived data and format it according to Memento’s rules.

The generic views in this package can be used to quickly:

  • Publish a TimeMap that lists all of the archived versions of each URL in your archive
  • Host a TimeGate that handles requests that include a URL and a timestamp, redirecting to the detail view for the nearest archive
  • Enrich detail views in the archive to include the extra metadata required by the Memento system

Documentation

Getting started

Installation

Before you begin, you should have a Django project created and configured.

In­stall our library from PyPI, like so:

$ pip install django-memento-framework

Edit your settings.py and add this app to your INSTALLED_APPS list.

IN­STALLED_APPS = (
    # ...
    # other apps would be above this of course
    # ...
    'memento',
)

Example project

For the purposes of this introduction, imagine a Django project that archives a list of URLs each hour. A simple set of models could look something like this.

from django.db import models


class Page(models.Model):
    """
    A web page that is archived each hour.
    """
    url = models.URLField()


class ArchivedHTML(model.Model):
    """
    A HTML snapshot of a single web page.
    """
    page = models.ForeignKey(Page)
    timestamp = models.DateTimeField()
    html = models.TextField()

    def get_absolute_url(self):
        return '/%s/' % self.pk

We won’t go into all the code you’d need for this archive to actually work, but I bet you can imagine the sort of thing you’d need. Imagine it’s there and it’s been archiving URLs every hour for a few years.

TimeMaps

A good first step is create a TimeMap that will index all of the archives in your database. This is much like the sitemaps created for traditional search engines, but expanded to include datetime metadata.

This library includes a generic view that will publish a queryset in Memento’s link format, that can be optionally paginated. It is designed to emulate Django’s built-in feed framework and operates in much the same way.

You could start by create a pattern in urls.py that will accept a web page’s URL with the goal of returning all the snapshots for that resource in our example archive.

url(
    r'^timemap/(?P<url>.*)$',
    views.ExampleTimemapLinkList(),
    name="timemap-archivedhtml"
),

That URL now needs to connect to a view named ExampleTimemapLinkList so let’s go make it in your views.py file. It will inherit from our custom class and include configuration to pull the Page object from the database that matches the submitted URL and convert a list of all its archived HTML into a Timemap.

from memento.timemap import TimemapLinkList


class ExampleTimemapLinkList(TimemapLinkList):
    paginate_by = 1000

    def get_object(self, request, url):
        """
        Take the URL from the request and check if we have it in our Page model.
        """
        return get_object_or_404(Page, url=url)

    def get_original_url(self, obj):
        """
        Return the original URL being archived from the Page model after its been pulled.
        """
        return obj.url

    def memento_list(self, obj):
        """
        Pull a queryset set of all the ArchivedHTML records of the Page object.
        """
        return ArchivedHTML.objects.filter(page=obj)

    def memento_datetime(self, item):
        """
        Return the datetime when the archived was saved from each ArchivedURL object.
        """
        return item.timestamp

Now if you fired up your test server and visited http://localhost:8000/timemap/link/http://example.com/ You’d get a paginated index of all the archives saved from the page example.com.

TimeGate

The natural next step is to create the other pillar of the Memento system, a TimeGate. It is a view that when given a request that includes a URL and a timestamp will redirect to the detail page for the nearest archive.

A good first step is to create an URL in urls.py.

url(
    r'^timegate/(?P<url>.*)$',
    views.ExampleTimeGateView.as_view(),
    name="timegate"
),

Next you need to create the view included there.

class ExampleTimeGateView(TimeGateView):
    model = ArchivedHTML
    url_field = 'page__url' # You can walk across ForeignKeys like normal
    datetime_field = 'timestamp'
    timemap_pattern_name = "timemap-archived" # The name of the timemap URL we created above

This will now return a 302 redirect when given a URL in the request with a header of Accept-Datetime. A good way to check out if this is working is to use cURL on the command line.

$ url -X HEAD -i http://localhost:8000/timegate/http://archivedsite.com/ --header "Accept-Datetime: Fri, 1 May 2015 00:01:00 GMT"
HTTP/1.1 302 Moved Temporarily
Server: Apache/2.2.22 (Ubuntu)
Link: <http://archivedsite.com/>; rel="original", <http://localhost:800/timemap/link/http://archivedsite.com/>; rel="timemap"; type="application/link-format"
Location: http://www.example.com/screenshot/100/
Vary: accept-datetime

Memento detail view

The detail page for each snapshot in the archive should also be enriched to include Memento metadata in its response. This is done by using our extended version of Django’s generic DetailView.

from memento.timegate import MementoDetailView


class ExampleMementoDetailView(MementoDetailView):
    model = ArchivedHTML
    datetime_field = 'timestamp'
    timemap_pattern_name = "timemap-archivedhtml"
    timegate_pattern_name = "timegate"

    def get_original_url(self, obj):
        return obj.page.url

That can be linked to a pattern in the urls.py file like any other Django detail view.

url(
    r'^archived-html/(?P<pk>\d+)/$',
    views.ExampleMementoDetailView.as_view(),
    name='archivedhtml-detail'
)

And that’s it! Get all that going and your archive is ready to be indexed by http://mementoweb.org/ and integrated into Memento-based services.

Generic views

MementoDetailView

class MementoDetailView(DetailView)

Extends Django’s generic DetailView to describe an archived resource.

A set of extra headers is added to the response. They are:

  • The Memento-Datetime when the resource was archived.
  • A Link that includes the URL of the original resource as well as the location of the TimeMap the publishes the directory of all versions archived by your site and a TimeGate where a datetime can be submitted to find the closest mementos for this resource.
datetime_field

A string attribute that is the name of the database field that contains the timestamp when the resource was archived. Default 'datetime'.

timemap_pattern_name

The name of the URL pattern for this site’s TimeMap that, given the original url, is able to reverse to return the location of the map that serves as the directory of all versions of this resource archived by your site. Optional.

timegate_pattern_name

The name of the URL pattern for this site’s TimeGate that, given the original url, is able to reverse to return the location of the url where a datetime can be submitted to find the closest mementos for this resource. Optional.

get_original_url()

A method that, given the object being rendered by the view, will return the original URL of the archived resource.

Example myapp/views.py

from memento.timegate import MementoDetailView


class ExampleMementoDetailView(MementoDetailView):
    """
    A Memento-enabled detail page for a screenshot in my example
    archive.

    It is linked to a url that looks like something like:

        url(
            r'^screenshot/(?P<pk>\d+)/$',
            views.ExampleMementoDetailView.as_view(),
            name='screenshot-detail'
        )

    """
    model = Screenshot
    datetime_field = 'timestamp'
    timemap_pattern_name = "timemap-screenshot"
    timegate_pattern_name = "timegate-screenshot"

    def get_original_url(self, obj):
        return obj.site.url

Example response

$ curl -X HEAD -i http://www.example.com/screenshot/100/
HTTP/1.1 200 OK
Server: Apache/2.2.22 (Ubuntu)
Link: <http://archivedsite.com/>; rel="original", <http://www.example.com/timemap/link/http://archivedsite.com/>; rel="timemap"; type="application/link-format", <http://www.example.com/timegate/http://archivedsite.com/>; rel="timegate"
Memento-Datetime: Fri, 1 May 2015 00:00:01 GMT

TimeGateView

class TimeGateView(RedirectView)

Creates a TimeGate that handles a request with a ‘Accept-Datetime’ headers and returns a response that redirects to the corresponding Memento.

model

A Django database model where the object will be drawn with a Model.objects.filter() query. Optional. If you want to provide a more specific list, define the queryset attribute instead.

queryset

The list of objects that will be provided to the template. Can be any iterable of items, not just a Django queryset. Optional, but if this attribute is not defined the model attribute must be defined.

datetime_field

A string attribute that is the name of the database field that contains the timestamp when the resource was archived. Default 'datetime'.

url_kwarg

The name for the keyword argument in the URL pattern that will be used to filter the queryset down to objects archived for the resource. Default 'url'.

url_field

A string attribute that is the name of the database field that contains the original URL archived. Defailt 'url'.

timemap_pattern_name

The name of the URL pattern for this site’s TimeMap that, given the original url, is able to reverse to return the location of the map that serves as the directory of all versions of this resource archived by your site. Optional.

Example myapp/views.py

from memento.timegate import TimeGateView


class ExampleTimeGateView(TimeGateView):
    """
    A Memento TimeGate that, given a timestamp, will redirect to the detail page for a screenshot in my example archive

    It is linked to a url that looks like something like:

        url(
            r'^timegate/(?P<url>.*)$',
            views.ExampleTimeGateView.as_view(),
            name="timegate"
        ),

    """
    model = Screenshot
    url_field = 'site__url' # You can walk across ForeignKeys like normal
    datetime_field = 'timestamp'
    timemap_pattern_name = "timemap-screenshot"

Example response

$ curl -X HEAD -i http://www.example.com/timegate/http://archivedsite.com/ --header "Accept-Datetime: Fri, 1 May 2015 00:01:00 GMT"
HTTP/1.1 302 Moved Temporarily
Server: Apache/2.2.22 (Ubuntu)
Link: <http://archivedsite.com/>; rel="original", <http://www.example.com/timemap/link/http://archivedsite.com/>; rel="timemap"; type="application/link-format"
Location: http://www.example.com/screenshot/100/
Vary: accept-datetime