catalog.api.utils package#
Submodules#
catalog.api.utils.ccrel module#
Tools for embedding ccREL data into files using XMP.
ccREL stands for Creative Commons Rights Expression Language. XMP stands for Extensible Metadata Platform.
This implementation is specifically for embedding ccREL inside of images, but it could be extended to handle other types of content.
For more information, see the [ccREL W3 standard](https://www.w3.org/Submission/ccREL/).
- catalog.api.utils.ccrel.embed_xmp_bytes(image: BytesIO, work_properties)#
Embed ccREL metadata inside a file-like io.BytesIO object.
For our purposes, we assume that the file is an image.
- Parameters:
image – A BytesIO representation of an image.
work_properties – A dictionary with keys ‘license_url’ and
‘attribution’. ‘creator’, and ‘work_landing_page’ are optional (but highly recommended) :return: An io.BytesIO object containing XMP metadata.
catalog.api.utils.dead_link_mask module#
- catalog.api.utils.dead_link_mask.get_query_hash(s: Search) str #
Hash the search query using a deterministic algorithm.
Generates a deterministic Murmur3 or SHA256 hash from the serialized Search object using DeepHash so that two Search objects with the same content will produce the same hash.
- Parameters:
s – Search object to be serialized and hashed.
- Returns:
Serialized Search object hash.
- catalog.api.utils.dead_link_mask.get_query_mask(query_hash: str) list[int] #
Fetch an existing query mask for a given query hash or returns an empty one.
- Parameters:
query_hash – Unique value for a particular query.
- Returns:
Boolean mask as a list of integers (0 or 1).
- catalog.api.utils.dead_link_mask.save_query_mask(query_hash: str, mask: list)#
Save a query mask to redis.
- Parameters:
mask – Boolean mask as a list of integers (0 or 1).
query_hash – Unique value to be used as key.
catalog.api.utils.exceptions module#
- catalog.api.utils.exceptions.exception_handler(ex, context)#
Handle the exception raised in a DRF context.
See `DRF docs`_. .. _DRF docs: https://www.django-rest-framework.org/api-guide/exceptions/#custom-exception-handling # noqa: E501
- Parameters:
ex – the exception that has occurred
context – additional data about the context of the exception
- Returns:
the response to show for the exception
catalog.api.utils.oauth2_helper module#
- catalog.api.utils.oauth2_helper.get_token_info(token: str)#
Recover an OAuth2 application client ID and rate limit model from an access token.
- Parameters:
token – An OAuth2 access token.
- Returns:
If the token is valid, return the client ID associated with the
token, rate limit model, and email verification status as a tuple; else return
(None, None, None)
.
catalog.api.utils.pagination module#
catalog.api.utils.scheduled_tasks module#
Cron-like tasks run at a set interval.
python3 manage.py runcrons will execute any scheduled tasks. This is intended to run on all instances of the server.
Even though there may be multiple instances of the server running, a job is guaranteed to execute only once. Jobs are not run unless it can acquire a lock inside of the cache (shared by all instances of api).
- class catalog.api.utils.scheduled_tasks.SaveCachedTrafficStats#
Bases:
CronJobBase
Handle recording of stats to cache and periodically persisting them in the DB.
Traffic statistics (view count, API usage) are stored in Redis for fast updates and retrieval. In order to ensure durability of statistics and minimize cache memory requirements, they are intermittently replicated to the database in small batches and subsequently evicted from the cache if they exceed a certain age. Recently updated view data is replicated but not evicted.
After traffic statistics have been stored in the database, they are replicated to Elasticsearch by es-syncer and used to compute trending views.
- MIN_NUM_FAILURES = 5#
- RUN_EVERY_MINS = 20#
- code = 'catalog.api.utils.scheduled_tasks.SaveCachedTrafficStats'#
- do()#
- schedule = <django_cron.Schedule object>#
catalog.api.utils.status_code_view module#
- catalog.api.utils.status_code_view.get_status_code_view(data, status_code=200)#
Get a class-based view that returns the same response on all HTTP methods.
This is useful for blanket discontinuation of API endpoints.
- Parameters:
data – the dictionary to serialize as the JSON response
status_code – the status code of the returned response
- Returns:
the class based view that returns the same response for all methods
catalog.api.utils.throttle module#
- class catalog.api.utils.throttle.AbstractAnonRateThrottle#
Bases:
SimpleRateThrottle
Limits the rate of API calls that may be made by a anonymous users.
The IP address of the request will be used as the unique cache key.
- get_cache_key(request, view)#
Should return a unique cache-key which can be used for throttling. Must be overridden.
May return None if the request should not be throttled.
- logger = <Logger catalog.api.utils.throttle.AnonRateThrottle (INFO)>#
- class catalog.api.utils.throttle.AbstractOAuth2IdRateThrottle#
Bases:
SimpleRateThrottle
Ties a particular throttling scope from
settings.py
to a rate limit model.See
ThrottledApplication.rate_limit_model
for an explanation of that concept.- applies_to_rate_limit_model: str#
- get_cache_key(request, view)#
Should return a unique cache-key which can be used for throttling. Must be overridden.
May return None if the request should not be throttled.
- class catalog.api.utils.throttle.AnonThumbnailRateThrottle#
Bases:
AbstractAnonRateThrottle
- scope = 'anon_thumbnail'#
- class catalog.api.utils.throttle.BurstRateThrottle#
Bases:
AbstractAnonRateThrottle
- scope = 'anon_burst'#
- class catalog.api.utils.throttle.EnhancedOAuth2IdBurstRateThrottle#
Bases:
AbstractOAuth2IdRateThrottle
- applies_to_rate_limit_model: str = 'enhanced'#
- scope: str = 'enhanced_oauth2_client_credentials_burst'#
- class catalog.api.utils.throttle.EnhancedOAuth2IdSustainedRateThrottle#
Bases:
AbstractOAuth2IdRateThrottle
- applies_to_rate_limit_model: str = 'enhanced'#
- scope: str = 'enhanced_oauth2_client_credentials_sustained'#
- class catalog.api.utils.throttle.ExemptOAuth2IdRateThrottle#
Bases:
AbstractOAuth2IdRateThrottle
- applies_to_rate_limit_model: str = 'exempt'#
- scope: str = 'exempt_oauth2_client_credentials'#
- class catalog.api.utils.throttle.OAuth2IdBurstRateThrottle#
Bases:
AbstractOAuth2IdRateThrottle
- applies_to_rate_limit_model: str = 'standard'#
- scope: str = 'oauth2_client_credentials_burst'#
- class catalog.api.utils.throttle.OAuth2IdSustainedRateThrottle#
Bases:
AbstractOAuth2IdRateThrottle
- applies_to_rate_limit_model: str = 'standard'#
- scope: str = 'oauth2_client_credentials_sustained'#
- class catalog.api.utils.throttle.OAuth2IdThumbnailRateThrottle#
Bases:
AbstractOAuth2IdRateThrottle
- applies_to_rate_limit_model: str = 'standard'#
- scope: str = 'oauth2_client_credentials_thumbnail'#
- class catalog.api.utils.throttle.OnePerSecond#
Bases:
AbstractAnonRateThrottle
- rate = '1/second'#
- class catalog.api.utils.throttle.OneThousandPerMinute#
Bases:
AbstractAnonRateThrottle
- rate = '1000/min'#
- class catalog.api.utils.throttle.SustainedRateThrottle#
Bases:
AbstractAnonRateThrottle
- scope = 'anon_sustained'#
- class catalog.api.utils.throttle.TenPerDay#
Bases:
AbstractAnonRateThrottle
- rate = '10/day'#
catalog.api.utils.validate_images module#
- catalog.api.utils.validate_images.validate_images(query_hash: str, start_slice: int, results: list[elasticsearch_dsl.response.hit.Hit], image_urls: list[str]) None #
Make sure images exist before we display them.
Treat redirects as broken links since most of the time the redirect leads to a generic “not found” placeholder.
Results are cached in redis and shared amongst all API servers in the cluster.
catalog.api.utils.watermark module#
- class catalog.api.utils.watermark.Dimension(value)#
Bases:
Flag
This enum represents the two dimensions of an image.
- BOTH = 3#
- HEIGHT = 1#
- NONE = 0#
- WIDTH = 2#
- catalog.api.utils.watermark.watermark(image_url, info, draw_frame=True)#
Return a PIL Image with a watermark and embedded metadata.
- Parameters:
image_url – The URL of the image.
info – A dictionary with keys title, creator, license, and
license_version :param draw_frame: Whether to draw an attribution frame. :returns: A PIL Image and its EXIF data, if included.
catalog.api.utils.waveform module#
- catalog.api.utils.waveform.cleanup(file_name)#
Delete the audio file after it has been processed.
- Parameters:
file_name – the name of the file to delete
- catalog.api.utils.waveform.download_audio(url, identifier)#
Download the audio from the given URL to a location on the disk.
- Parameters:
url – the URL to the file being downloaded
identifier – the identifier of the media object to name the file
- Returns:
the name of the file on the disk
- catalog.api.utils.waveform.ext_from_url(url)#
Get the file extension from the given URL.
Looks at the last part of the URL path, and returns the string after the last dot.
- Parameters:
url – the URL to the file whose extension is being determined
- Returns:
the file extension or
None
- catalog.api.utils.waveform.generate_peaks(audio) list[float] #
- catalog.api.utils.waveform.generate_waveform(file_name, duration)#
Generate the waveform for the file by invoking the
audiowaveform
binary.The Python module
subprocess
is used to execute the binary and get the results that it emits to STDOUT.- Parameters:
file_name – the name of the downloaded audio file
duration – the duration of the audio to determine pixels per second
- catalog.api.utils.waveform.process_waveform_output(json_out)#
Parse the waveform output generated by the
audiowaveform
binary.The output consists of alternating positive and negative values, that are almost equal in amplitude. We discard the negative values. We also scale down the amplitudes by the largest value so that they lie in the range [0, 1].
- Parameters:
json_out – the JSON output generated by
audiowaveform
- Returns:
the list of peaks