Note: This is adapted from a conversation that happened on the Make WP Slack (see the link for more context).
Our current situation regarding thumbnails
Thumbnails were collected by providers and put in the thumbnail
column in the catalog. We have data there now, and the column has not been dropped. Since this column also exists in the API, the data from it (for both audio and image) gets copied into the thumbnail
column there as well. However, only the Audio model uses the thumbnail
column, the Image model ignores it and tries to create a scaled down thumbnail using the primary image URL. The MediaStore class in the provider has been modified so that no new thumbnail URLs can be added to the thumbnail
column, but they can be added in the metadata under meta_data["thumbnail_url"]
.
We have run into scenarios recently where our own attempts to create thumbnails from the full-size image URLs have caused consistent timeouts and prevent the records from being shown in search results. SMK is a recent example of this: (1) (2). It may be advantageous to use the provider’s pre-computed thumbnails in these cases. Crucially, not all providers supply or are expected to supply thumbnails. As such, it seems like we may want to reduce the width of our tables by removing this frequently-empty column.
Proposal
For image
: we first go through and copy all of the existing thumbnail
values in the catalog into meta_data["thumbnail_url"]
. Then we perform a new-table-without-column-swap and remove the thumbnail
column from from the table. This will help conserve space and reduce the width of the table. Then we update the image_view
materialized view to populate a thumbnail
column for the API An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. using the meta_data
dictionary. An example of this is in the Audio view’s audio_set_foreign_identifier
. This will allow us to keep the thumbnail
column in the API but reduce space on the catalog, since a migration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. of that size on the API would likely be untenable at the moment. We’ll then make two changes to the logic on both ends:
- Add
thumbnail_url
handling logic in the ImageStore
which will put that value in the meta_data
dictionary automatically
- Change the Image model’s logic on the API to first try and use
image.thumbnail
(which has been copied from meta_data["thumbnail_url"]
during the data refresh), then try image.url
when creating the scaled down image
We could potentially do the same for Audio, it looks like about half of the records do not have thumbnails:
deploy@localhost:openledger> select count(*) from audio where thumbnail is null;
+--------+
| count |
|--------|
| 427124 |
+--------+
SELECT 1
Time: 4.354s (4 seconds), executed in: 4.328s (4 seconds)
deploy@localhost:openledger> select count(*) from audio;
+--------+
| count |
|--------|
| 949384 |
+--------+
SELECT 1
Time: 3.899s (3 seconds), executed in: 3.887s (3 seconds)
What do folks think about this approach? I think any effort to reduce the width of our DB tables is an important one, as it will make migrations & data management easier down the line.
#data-normalization, #postgres, #thumbnails