catalog.api.utils package#


catalog.api.utils.ccrel module#

Tools for embedding ccREL data into files using XMP.

ccREL stands for Creative Commons Rights Expression Language. XMP stands for Extensible Metadata Platform.

This implementation is specifically for embedding ccREL inside of images, but it could be extended to handle other types of content.

For more information, see the [ccREL W3 standard](

catalog.api.utils.ccrel.embed_xmp_bytes(image: BytesIO, work_properties)#

Embed ccREL metadata inside a file-like io.BytesIO object.

For our purposes, we assume that the file is an image.

  • image – A BytesIO representation of an image.

  • work_properties – A dictionary with keys ‘license_url’ and

‘attribution’. ‘creator’, and ‘work_landing_page’ are optional (but highly recommended) :return: An io.BytesIO object containing XMP metadata.

catalog.api.utils.exceptions module#

catalog.api.utils.exceptions.exception_handler(ex, context)#

Handle the exception raised in a DRF context.

See `DRF docs`_. .. _DRF docs: # noqa: E501

  • ex – the exception that has occurred

  • context – additional data about the context of the exception


the response to show for the exception

catalog.api.utils.oauth2_helper module#

catalog.api.utils.oauth2_helper.get_token_info(token: str)#

Recover an OAuth2 application client ID and rate limit model from an access token.


token – An OAuth2 access token.


If the token is valid, return the client ID associated with the

token, rate limit model, and email verification status as a tuple; else return (None, None, None).

catalog.api.utils.pagination module#

class catalog.api.utils.pagination.StandardPagination(*args, **kwargs)#

Bases: PageNumberPagination

page_query_param = 'page'#
page_size_query_param = 'page_size'#

catalog.api.utils.scheduled_tasks module#

Cron-like tasks run at a set interval.

python3 runcrons will execute any scheduled tasks. This is intended to run on all instances of the server.

Even though there may be multiple instances of the server running, a job is guaranteed to execute only once. Jobs are not run unless it can acquire a lock inside of the cache (shared by all instances of api).

class catalog.api.utils.scheduled_tasks.SaveCachedTrafficStats#

Bases: CronJobBase

Handle recording of stats to cache and periodically persisting them in the DB.

Traffic statistics (view count, API usage) are stored in Redis for fast updates and retrieval. In order to ensure durability of statistics and minimize cache memory requirements, they are intermittently replicated to the database in small batches and subsequently evicted from the cache if they exceed a certain age. Recently updated view data is replicated but not evicted.

After traffic statistics have been stored in the database, they are replicated to Elasticsearch by es-syncer and used to compute trending views.

code = 'catalog.api.utils.scheduled_tasks.SaveCachedTrafficStats'#
schedule = <django_cron.Schedule object>#

catalog.api.utils.status_code_view module#

catalog.api.utils.status_code_view.get_status_code_view(data, status_code=200)#

Get a class-based view that returns the same response on all HTTP methods.

This is useful for blanket discontinuation of API endpoints.

  • data – the dictionary to serialize as the JSON response

  • status_code – the status code of the returned response


the class based view that returns the same response for all methods

catalog.api.utils.throttle module#

class catalog.api.utils.throttle.AbstractAnonRateThrottle#

Bases: SimpleRateThrottle

Limits the rate of API calls that may be made by a anonymous users.

The IP address of the request will be used as the unique cache key.

get_cache_key(request, view)#

Should return a unique cache-key which can be used for throttling. Must be overridden.

May return None if the request should not be throttled.

logger = <Logger catalog.api.utils.throttle.AnonRateThrottle (INFO)>#
class catalog.api.utils.throttle.AbstractOAuth2IdRateThrottle#

Bases: SimpleRateThrottle

Ties a particular throttling scope from to a rate limit model.

See ThrottledApplication.rate_limit_model for an explanation of that concept.

applies_to_rate_limit_model: str#
get_cache_key(request, view)#

Should return a unique cache-key which can be used for throttling. Must be overridden.

May return None if the request should not be throttled.

class catalog.api.utils.throttle.AnonThumbnailRateThrottle#

Bases: AbstractAnonRateThrottle

scope = 'anon_thumbnail'#
class catalog.api.utils.throttle.BurstRateThrottle#

Bases: AbstractAnonRateThrottle

scope = 'anon_burst'#
class catalog.api.utils.throttle.EnhancedOAuth2IdBurstRateThrottle#

Bases: AbstractOAuth2IdRateThrottle

applies_to_rate_limit_model: str = 'enhanced'#
scope: str = 'enhanced_oauth2_client_credentials_burst'#
class catalog.api.utils.throttle.EnhancedOAuth2IdSustainedRateThrottle#

Bases: AbstractOAuth2IdRateThrottle

applies_to_rate_limit_model: str = 'enhanced'#
scope: str = 'enhanced_oauth2_client_credentials_sustained'#
class catalog.api.utils.throttle.ExemptOAuth2IdRateThrottle#

Bases: AbstractOAuth2IdRateThrottle

applies_to_rate_limit_model: str = 'exempt'#
scope: str = 'exempt_oauth2_client_credentials'#
class catalog.api.utils.throttle.OAuth2IdBurstRateThrottle#

Bases: AbstractOAuth2IdRateThrottle

applies_to_rate_limit_model: str = 'standard'#
scope: str = 'oauth2_client_credentials_burst'#
class catalog.api.utils.throttle.OAuth2IdSustainedRateThrottle#

Bases: AbstractOAuth2IdRateThrottle

applies_to_rate_limit_model: str = 'standard'#
scope: str = 'oauth2_client_credentials_sustained'#
class catalog.api.utils.throttle.OAuth2IdThumbnailRateThrottle#

Bases: AbstractOAuth2IdRateThrottle

applies_to_rate_limit_model: str = 'standard'#
scope: str = 'oauth2_client_credentials_thumbnail'#
class catalog.api.utils.throttle.OnePerSecond#

Bases: AbstractAnonRateThrottle

rate = '1/second'#
class catalog.api.utils.throttle.OneThousandPerMinute#

Bases: AbstractAnonRateThrottle

rate = '1000/min'#
class catalog.api.utils.throttle.SustainedRateThrottle#

Bases: AbstractAnonRateThrottle

scope = 'anon_sustained'#
class catalog.api.utils.throttle.TenPerDay#

Bases: AbstractAnonRateThrottle

rate = '10/day'#

catalog.api.utils.validate_images module#

catalog.api.utils.validate_images.validate_images(query_hash: str, start_slice: int, results: list[elasticsearch_dsl.response.hit.Hit], image_urls: list[str]) None#

Make sure images exist before we display them.

Treat redirects as broken links since most of the time the redirect leads to a generic “not found” placeholder.

Results are cached in redis and shared amongst all API servers in the cluster.

catalog.api.utils.watermark module#

class catalog.api.utils.watermark.Dimension(value)#

Bases: Flag

This enum represents the two dimensions of an image.

BOTH = 3#
NONE = 0#
WIDTH = 2#
catalog.api.utils.watermark.watermark(image_url, info, draw_frame=True)#

Return a PIL Image with a watermark and embedded metadata.

  • image_url – The URL of the image.

  • info – A dictionary with keys title, creator, license, and

license_version :param draw_frame: Whether to draw an attribution frame. :returns: A PIL Image and its EXIF data, if included.

catalog.api.utils.waveform module#


Delete the audio file after it has been processed.


file_name – the name of the file to delete

catalog.api.utils.waveform.download_audio(url, identifier)#

Download the audio from the given URL to a location on the disk.

  • url – the URL to the file being downloaded

  • identifier – the identifier of the media object to name the file


the name of the file on the disk


Get the file extension from the given URL.

Looks at the last part of the URL path, and returns the string after the last dot.


url – the URL to the file whose extension is being determined


the file extension or None

catalog.api.utils.waveform.generate_peaks(audio) list[float]#
catalog.api.utils.waveform.generate_waveform(file_name, duration)#

Generate the waveform for the file by invoking the audiowaveform binary.

The Python module subprocess is used to execute the binary and get the results that it emits to STDOUT.

  • file_name – the name of the downloaded audio file

  • duration – the duration of the audio to determine pixels per second


Parse the waveform output generated by the audiowaveform binary.

The output consists of alternating positive and negative values, that are almost equal in amplitude. We discard the negative values. We also scale down the amplitudes by the largest value so that they lie in the range [0, 1].


json_out – the JSON output generated by audiowaveform


the list of peaks

Module contents#