A week in Openverse: 2024-07-08 – 2024-07-15

openverse

Merged PRs

Catalog

  • #4552: Split `batched_update` DAG into automated and manual DAGs

Documentation

  • #4552: Split `batched_update` DAG into automated and manual DAGs

Frontend

  • #4584: Fix TypeError by checking if duration is not Finite before setting cu…
  • #4591: Remove all usage of jest

Management

  • #4560: Make `ov` base image updates overall more convenient and take immediate effect
  • #4591: Remove all usage of jest
  • #4606: Bump certifi from 2024.2.2 to 2024.7.4 in /automations/python
  • #4607: Bump tqdm from 4.66.2 to 4.66.3 in /utilities/generate_test_locales
  • #4608: Bump certifi from 2024.2.2 to 2024.7.4 in /utilities/generate_test_locales

Closed issues

Catalog

  • #4457: Separate the batched update DAG into manually and automatically triggered DAGs

Frontend

  • #4156: TypeError: Failed to set the 'currentTime' property on 'HTMLMediaElement': The provided double value is non-…

Management

  • #4545: Install pipx packages outside the dev-env volume

openverse-infrastructure

Merged PRs

Infra

  • #960: Fix Nuxt 3 frontend alarm
  • #966: Add all initial datasources to Grafana
  • #972: BlockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience. Internet Archive from single and search views
  • #976: Remove usage of `terraform-aws-modules/vpc/aws`

Management

  • #977: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org.

Closed issues

Infra

  • #953: Set up Grafana Cloud for initial access and use by Openverse maintainers
  • #969: Remove `terraform-aws-modules/vpc/aws` usage

#openverse, #week-in-openverse

A week in Openverse: 2024-07-01 – 2024-07-08

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4573: Fix Docker build warnings
  • #4590: Drop `status` column from media reports
  • #4593: Bump `schemathesis` to v3.31.0 and silence warnings on API tests

Catalog

  • #4573: Fix Docker build warnings

Documentation

  • #4571: Publish changelog for ingestion_server-2024.06.28.20.22.20
  • #4588: Publish changelog for frontend-2024.07.01.20.35.48
  • #4589: Add manual changelog for API 2024.07.01.20.35.48

Frontend

  • #4573: Fix Docker build warnings
  • #4574: Update dependency prettier-pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party-tailwindcss to v0.6.5
  • #4577: Update dependency @playwright/test to v1.45.0
  • #4578: Update dependency @swc/cli to ^0.4.0
  • #4580: Update dependency prettier to v3.3.2
  • #4581: Update dependency typescript to v5.5.2
  • #4583: Remove volta and engines.pnpm settings
  • #4588: Publish changelog for frontend-2024.07.01.20.35.48

Ingestion Server

  • #4571: Publish changelog for ingestion_server-2024.06.28.20.22.20
  • #4579: Update dependency elasticsearch to v8.14.0

Management

  • #4573: Fix Docker build warnings
  • #4575: Update workflows
  • #4592: Try adding ignoreScripts to fix pnpmfileChecksum update issue
  • #4604: Bump certifi from 2023.11.17 to 2024.7.4 in /utilities/project_planning

Closed issues

API

  • #3642: Drop `status` column from content report tables

Management

  • #4586: Renovate's pnpm is not updating pnpmfileChecksum

openverse-infrastructure

Merged PRs

Frontend

  • #965: Implement Nuxt's recommendations for Cloudflare configuration

Infra

  • #964: Update the bastion base image and allow it to pull updates without recreation
  • #965: Implement Nuxt's recommendations for Cloudflare configuration

Ingestion Server

  • #958: Bump ingestion server to rel-2024.06.28.20.22.20

Management

  • #963: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org.

#openverse, #week-in-openverse

A week in Openverse: 2024-06-24 – 2024-07-01

openverse

Merged PRs

Analytics

  • #4550: Fix Plausible setup after domain was already set
  • #4568: Specify pull policy for `openverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org.-` images

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4499: Include xml in frontend attribution options
  • #4530: Drop FK constraint on media_obj in MediaDecisionThrough, update backfillmoderationdecision command
  • #4536: Do not use unstable pook reference for API
  • #4540: Add linting for Dockerfiles
  • #4544: Create sensitive and deleted media models for decisions
  • #4547: Publish changelog for api-2024.06.24.18.01.42
  • #4551: Shorten PDM hash to first 8 characters
  • #4554: Remove backfillmoderationdecision management command after production run
  • #4568: Specify pull policy for `openverse-` images

Catalog

  • #4475: Add DAG to decode and deduplicate image tags with escaped literal unicode sequences
  • #4495: Fix placing test S3 data into MinIO
  • #4497: Add CI/CD and PDM to new indexer worker
  • #4526: Fix separators in catalog and dev-env images and dev-env volume
  • #4540: Add linting for Dockerfiles
  • #4555: Ensure plpython3u exists in live db when using it
  • #4557: Remove `trim_and_deduplicate_tags` DAG after successful run
  • #4568: Specify pull policy for `openverse-` images

Documentation

  • #4417: Implementation Plan: Augment the catalog database with suitable Rekognition tags
  • #4475: Add DAG to decode and deduplicate image tags with escaped literal unicode sequences
  • #4546: combine frontend testing documentation pages
  • #4547: Publish changelog for api-2024.06.24.18.01.42
  • #4548: Publish changelog for frontend-2024.06.24.18.01.44
  • #4557: Remove `trim_and_deduplicate_tags` DAG after successful run
  • #4562: Publish changelog for frontend-2024.06.26.17.18.17

Frontend

  • #4291: Display generated tags separately
  • #4497: Add CI/CD and PDM to new indexer worker
  • #4499: Include xml in frontend attribution options
  • #4509: Replace "Over…" language with more precise "Top…"
  • #4516: Add caching to frontend NginxNGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers. https://www.nginx.com/. configuration
  • #4523: Fix possible TypeError when accessing properties of `route.value`
  • #4540: Add linting for Dockerfiles
  • #4548: Publish changelog for frontend-2024.06.24.18.01.44
  • #4549: Re-add tags page text
  • #4559: Fix flaky VCollectionHeader snapshot tests
  • #4562: Publish changelog for frontend-2024.06.26.17.18.17

Infra

  • #4516: Add caching to frontend Nginx configuration

Ingestion Server

  • #4471: Remove single quotes in values of Ingestion Server's TSV files
  • #4529: Upload Ingestion Server's TSV files to AWS S3 (skip tags)

Management

  • #4497: Add CI/CD and PDM to new indexer worker
  • #4526: Fix separators in catalog and dev-env images and dev-env volume
  • #4539: Add dev tools jq and HTTPie to `ov`
  • #4540: Add linting for Dockerfiles
  • #4546: combine frontend testing documentation pages
  • #4568: Specify pull policy for `openverse-` images

Closed issues

API

  • #4430: Attribution: XML/RDF/Turtle please.
  • #4454: Determine if all tags in the catalog database have an associated provider
  • #4512: The `AbstractMediaDecisionThrough` class and its inheriting classes shouldn't use actual foreign keys to media tables
  • #4513: Creating `MediaDecision` has no effect on deindexed actions

Catalog

  • #663: Upgrade catalog to Python 3.11
  • #1464: Create a DAG to log and report code review response times
  • #4199: Remove and de-duplicate tags with leading/trailing whitespace
  • #4454: Determine if all tags in the catalog database have an associated provider
  • #4494: Test S3 inaturalist files are not found in MinIO

Documentation

  • #4040: Implementation Plan: Augment the catalog database with suitable Rekognition tags
  • #4514: Combine frontend testing documentation pages

Frontend

  • #461: Add a message to inform the user about more filters when one media type is chosen
  • #2130: Update sensitive browsing designs to allow re-blurring of search results
  • #2213: Frontend local dev error `Cannot convert undefined or null to object`
  • #4192: Displaying machine-generated content
  • #4379: Write a page describing the machine-generated tags for the frontend
  • #4430: Attribution: XML/RDF/Turtle please.
  • #4470: Add caching of static assets to frontend Nginx
  • #4522: TypeError: Cannot read properties of undefined (reading 'name') in `useMatchRoute()`
  • #4558: vcollectionheader storybook visual regression test broken

Infra

  • #4470: Add caching of static assets to frontend Nginx

Ingestion Server

  • #3912: Upload Ingestion Server's TSV files to AWS S3

openverse-infrastructure

Merged PRs

API

  • #951: Fix and improve api-management-command script

Frontend

  • #940: Cache frontend assets at edge for 3 days

Infra

  • #924: Add `StatusCheckFailed` alarms for EC2 services
  • #940: Cache frontend assets at edge for 3 days
  • #942: Add Grafana PDC
  • #944: Touch up indexer worker pools to match IP requirements
  • #947: Bypass WAF for Cloudflare Access services
  • #948: Ignore changes to `actions_enabled` on externally controlled alarm
  • #951: Fix and improve api-management-command script
  • #956: BlockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience. malicious ASNs and UA string pattern 2024-06-27/28 incident

Management

  • #955: 🔄 synced file(s) with WordPress/openverse

Closed issues

API

  • #950: Disable migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. running during management command executions

Frontend

  • #927: Change frontend edge caching rules

Infra

  • #254: Audit logging costs and find savings
  • #792: Add EC2 instance state change monitor
  • #927: Change frontend edge caching rules
  • #941: Wire up Grafana PDC
  • #943: Add Cloudflare WAF skip rule for Airflow
  • #950: Disable migration running during management command executions

#openverse, #week-in-openverse

A week in Openverse: 2024-06-17 – 2024-06-24

openverse

Merged PRs

Analytics

  • #4330: Add catalog indexer worker

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4330: Add catalog indexer worker
  • #4500: Publish changelog for api-2024.06.17.15.33.56
  • #4508: Log dead link verification request timings

Catalog

  • #4330: Add catalog indexer worker
  • #4473: Fix trim and deduplicate tags deduplication
  • #4483: Changes all sensible occurrences of the just commands to have them run using ov
  • #4488: Publish changelog for catalog-2024.06.13.17.07.54
  • #4501: Publish changelog for catalog-2024.06.17.15.33.56
  • #4502: Update dependency apacheApache Apache is the most widely used web server software. Developed and maintained by Apache Software Foundation. Apache is an Open Source software available for free.-airflow to v2.9.2 [SECURITY]
  • #4524: Explicitly include FilterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. Data step in ingestion server removal IP
  • #4532: Bump requests from 2.31.0 to 2.32.2 in /indexer_worker

Documentation

  • #4465: Add data flow diagram for various ETL steps in pipelines
  • #4483: Changes all sensible occurrences of the just commands to have them run using ov
  • #4486: Update works count on the frontend
  • #4488: Publish changelog for catalog-2024.06.13.17.07.54
  • #4498: Publish changelog for frontend-2024.06.17.15.33.55
  • #4500: Publish changelog for api-2024.06.17.15.33.56
  • #4501: Publish changelog for catalog-2024.06.17.15.33.56
  • #4518: Update current_maintainers.md to add @zackkrida
  • #4524: Explicitly include Filter Data step in ingestion server removal IP

Frontend

  • #4446: Stop opening links in a new tab
  • #4483: Changes all sensible occurrences of the just commands to have them run using ov
  • #4486: Update works count on the frontend
  • #4498: Publish changelog for frontend-2024.06.17.15.33.55

Infra

  • #4491: Use a persistent container for `ov`
  • #4508: Log dead link verification request timings
  • #4527: Set `ov` workdir to current working directory

Ingestion Server

  • #4330: Add catalog indexer worker
  • #4519: Bump urllib3 from 2.2.1 to 2.2.2 in /ingestion_server
  • #4524: Explicitly include Filter Data step in ingestion server removal IP

Management

  • #4330: Add catalog indexer worker
  • #4483: Changes all sensible occurrences of the just commands to have them run using ov
  • #4496: Fix ov reference in hooksHooks In WordPress theme and development, hooks are functions that can be applied to an action or a Filter in WordPress. Actions are functions performed when a certain event occurs in WordPress. Filters allow you to modify certain functions. Arguments used to hook both filters and actions look the same.
  • #4503: Bump urllib3 from 2.1.0 to 2.2.2 in /utilities/project_planning
  • #4504: Make read contents permission explicit for PR automations
  • #4506: Prevent concurrency between release app and draft releases
  • #4511: Bump urllib3 from 2.2.1 to 2.2.2 in /automations/python
  • #4525: Make `ov clean` work when a container, image or volume does not exist
  • #4537: Sync the dependencies for PR automation init workflow to infra repo

Closed issues

API

  • #3199: Avoid API failure when requests URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org params aren't fully encoded
  • #3480: Bad Request error for url from Europeana when requesting thumbnail

Catalog

  • #4147: Implement new catalog indexer-worker
  • #4456: Update ingestion server removal IP to include plan for filtering tags

Documentation

  • #4455: Document current & desired ETL steps and data flow
  • #4480: Update the record count on the homepage
  • #4482: Update references to our developer tools to have the `./ov` prefix

Frontend

  • #496: Do not open external links in new tabs
  • #519: `Unable to get property 'name' of undefined or null reference` in useMatchRoute on Edge
  • #520: `TypeMismatchError` on search in Edge
  • #4480: Update the record count on the homepage

Infra

  • #4490: Refactor `ov` to create a persistent container

Ingestion Server

  • #4456: Update ingestion server removal IP to include plan for filtering tags

Management

  • #4422: `ov` hooks should reference the `ov` script directly, rather than relying on it being in the PATH
  • #4505: Prevent race condition with "Draft release" and "Release app" workflows

openverse-infrastructure

Merged PRs

Catalog

  • #937: Bump airflow to rel-2024.06.17.15.33.56

Frontend

  • #938: Fix duplicate nuxt alarms clashing

Infra

  • #930: Remove unnecessary policy from ECS task role
  • #936: Fix ansible/exec recipe
  • #938: Fix duplicate nuxt alarms clashing

Ingestion Server

  • #931: Bump ingestion server to rel-2024.06.13.17.07.56

Management

  • #939: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org.
  • #946: 🔄 synced file(s) with WordPress/openverse

Closed issues

Infra

  • #216: Remove the execution role from the task role
  • #844: Exclude `/api/event` endpoint from Nuxt HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. 5XX response alarm

#openverse, #week-in-openverse

A week in Openverse: 2024-06-10 – 2024-06-17

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4415: Add `backfillmoderationdecision` management command
  • #4444: More precisely handle waveform generation failures
  • #4467: Publish changelog for api-2024.06.07.17.19.06

Catalog

  • #4068: Add verbose logging option to `ProviderDataIngester`
  • #4429: Add DAG to trim and deduplicate tags
  • #4447: Capture thumbnails during europeana ingestion
  • #4460: Update the 'updated_on' column during popularity refresh
  • #4481: Moved by tag from the fuzzy match group to exact match

Documentation

  • #4441: Add favicon to Storybook
  • #4466: Publish changelog for frontend-2024.06.07.17.19.06
  • #4467: Publish changelog for api-2024.06.07.17.19.06
  • #4485: Order quickstart links, add missing catalog link
  • #4487: Publish changelog for ingestion_server-2024.06.13.17.07.56

Frontend

  • #4441: Add favicon to Storybook
  • #4442: Tags page copy
  • #4466: Publish changelog for frontend-2024.06.07.17.19.06

Ingestion Server

  • #4487: Publish changelog for ingestion_server-2024.06.13.17.07.56

Management

  • #4441: Add favicon to Storybook
  • #4472: Fix ov corepack and pdm existence issues

Closed issues

API

  • #3641: Create `ModerationDecision` backfill management command
  • #4218: Audio waveform should return 424 instead of 500 when waveform cannot be generated
  • #4474: The API `result_count` is no more than 240 for unauthenticated requests

Catalog

  • #1420: Add verbose logging option to `ProviderDataIngester`
  • #4403: Capture thumbnails during Europeana ingestion
  • #4453: Remove deny-listed tags in the catalog with the `batched_update` DAG
  • #4464: Move "by" tag contains filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. to tag exact match filter

Documentation

  • #4479: Link to the catalog quickstart guide from the central quickstart page

Infra

  • #2037: Move OpenverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org. API and catalog to `openverse.org` subdomains
  • #4489: Environment variables set when running `ov` not passed to the container

Management

  • #4468: `ov` will hang silently if `corepack` is used and there is an update to PNPM
  • #4469: `ov` does not capture error output if `pdm` not installed on host

openverse-infrastructure

Merged PRs

Infra

  • #920: Remove openverse.engineering Cf Access rules and update documentation
  • #928: Move Nuxt 3 to prod, create new listener rule for split testing

Management

  • #926: 🔄 synced file(s) with WordPress/openverse

Closed issues

Infra

  • #609: Use pre-commit and lint setup identical to the monorepo
  • #785: Remove any remaining Cloudflare resources from `openverse.engineering` zone

Management

  • #438: Enable merge queues and require PRs to be up-to-date before merging
  • #609: Use pre-commit and lint setup identical to the monorepo

#openverse, #week-in-openverse

A week in Openverse: 2024-06-03 – 2024-06-10

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4402: Rename ContentProvider to ContentSource
  • #4419: Update docker.io/redis Docker tag to v7.2.5
  • #4434: Publish changelog for api-2024.06.03.15.35.02
  • #4440: Handle tags without provider in media admin view

Catalog

  • #4366: Add catalog media properties documentation

Documentation

  • #4366: Add catalog media properties documentation
  • #4432: Update docs to recommend blobless cloning strategy
  • #4435: Add a link to the committer announcements in the committer docs
  • #4436: Update assets in the documentation
  • #4448: Updated Playwright Codegen broken link
  • #4449: Jest docs broken link fixed

Frontend

  • #4420: Update pnpm to v9.1.4
  • #4423: Update Node.js to v20.14.0
  • #4424: Update dependency @playwright/test to v1.44.1
  • #4425: Update dependency eslint-pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party-tsdoc to ^0.3.0
  • #4426: Update dependency prettier-plugin-tailwindcss to v0.6.1
  • #4428: Ensure required DB extension is installed before attempting to setup plausible
  • #4431: Add Nuxt 3 folders to gitignore
  • #4433: Publish changelog for frontend-2024.06.03.15.35.03
  • #4437: Delete `frontend/src/stories/` directory
  • #4445: Update pnpm to v9.2.0

Ingestion Server

  • #4418: Update dependency elasticsearch to v8.13.2
  • #4443: Revert "Save cleaned data of Ingestion Server to AWS S3 (#4163)"

Management

  • #4392: Add load testing script for frontend
  • #4416: Move NGINXNGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers. https://www.nginx.com/.-based services out of the API profile
  • #4421: Update workflows
  • #4438: Overhaul the complete labelling system
  • #4450: Fix incorrect brackets in PR automation
  • #4451: Update pr_automations.yml with missing character
  • #4462: Bump tornado from 6.4 to 6.4.1 in /utilities/project_planning

Closed issues

API

  • #3943: Implement logging for moderation events
  • #3944: Implement and surface value-based deferred metrics
  • #3946: Implement and surface list-based deferred metrics
  • #4289: CI + CD builds `nginx` image during API up
  • #4346: Rename the `ContentProvider` model to `ContentSource`
  • #4439: `/api/api/admin/media_report.py, line 387, in change_view` can fail if the tag does not have a provider

Catalog

  • #2187: Create the media properties description file
  • #4255: iNaturalist is no longer able to access S3

Documentation

  • #4329: Dramatically improve cloning speed for contributors
  • #4395: Add a favicon to our Docs site

Frontend

  • #3972: Update references to audio works to use "audio track(s)"
  • #4391: Create a script for load-testing the frontend

Management

  • #1968: Implementation Plan: Computer vision metadata for content reports
  • #3823: Seek alternatives to `banyan/auto-label`
  • #4203: Stack label is not applied to contributor PRs
  • #4391: Create a script for load-testing the frontend
  • #4400: Local Plausible setup can fail

openverse-infrastructure

Merged PRs

Infra

  • #916: Redirect all .engineering API requests
  • #918: Add nuxt-preview cache rule
  • #921: Update .engineering to .org redirect to exclude GutenbergGutenberg The Gutenberg project is the new Editor Interface for WordPress. The editor improves the process and experience of creating new content, making writing rich content much simpler. It uses ‘blocks’ to add richness rather than shortcodes, custom HTML etc. https://wordpress.org/gutenberg/ media inserter requests
  • #922: Bypass cache and WAF for non-production frontends with load testing UA string

Management

  • #923: Add Princewill Onyenanu (madewithkode) as a committer

Closed issues

API

  • #781: Open PR in Gutenberg to point integration to `api.openverse.org`
  • #782: Open PR to point Jetpack integration to api.openverse.org
  • #783: Remove headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. check from Cloudflare redirect rule

Infra

  • #779: Redirect production API requests to `api.openverse.org` when a special testing header is present
  • #784: Replace API openverse.engineering Cloudflare domain records with noops
  • #787: Downgrade openverse.engineering Cloudflare plan to the free tier
  • #917: Add cache rules for `nuxt-preview.openverse.org` to not cache it in Cloudflare

Management

  • #740: PR labeller should apply stack labels for infrastructure repo

#openverse, #week-in-openverse

A week in Openverse: 2024-05-27 – 2024-06-03

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4198: Warn on `license_url` computation in the API
  • #4360: Add favicon to Django API
  • #4372: Reduce permissions of default authentication scope
  • #4376: Configure IPython configuration dir in the API
  • #4386: Make media items the centre for all moderation activity
  • #4387: Make miscellaneous improvements to the API developer experience
  • #4394: Publish changelog for api-2024.05.27.15.21.38
  • #4397: Revert "Change search query approach to include only available providers (#4238)"
  • #4398: Publish changelog for api-2024.05.28.21.25.54
  • #4414: Add report creation, better filtering and more improvements to admin views for media

Catalog

  • #4370: Modify `add_license_url` DAG to use `batched_update`
  • #4385: Always assume special urgency for contributor PR pings
  • #4388: Added documentations for how to run DAGs in development alongside how to add new documentations.

Documentation

  • #4385: Always assume special urgency for contributor PR pings
  • #4388: Added documentations for how to run DAGs in development alongside how to add new documentations.

Frontend

  • #4339: Fix recent searches keyboard navigation
  • #4393: Publish changelog for frontend-2024.05.27.15.21.40
  • #4396: Improve accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) labels for filters tab and button

Ingestion Server

  • #4382: Drop `ORDER BY` clause from copy step when adding a limit
  • #4390: Publish changelog for ingestion_server-2024.05.27.13.36.10

Management

  • #4343: Dockerfy the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org. development environment
  • #4389: Fix path to banner in `README.md`
  • #4396: Improve accessibility labels for filters tab and button
  • #4399: Add recipes for cleaning up
  • #4401: Make Dockerfied development environment compatible with macOS
  • #4409: Add support for aliases to `ov`
  • #4410: Ignore v8 compile cache

Closed issues

API

  • #3638: Add content moderation actions to expanded media admin view
  • #3639: Soft lock moderation actions for works in review by a moderator
  • #4324: Reduce permissions of default authentication scope
  • #4341: Add favicon to Django API
  • #4412: Make admin media endpoint work for all media items not just those with reports
  • #4413: Include report filing in admin media view

Catalog

  • #1093: Remove the Community Involvement handbook page
  • #1095: Remove the provider ingestion script refactor handbook page
  • #3885: Backfill `license_url` field for images where it's null in the meta_data
  • #4348: The `add_license_url` DAG keeps timing out

Documentation

  • #4356: Create a document for how to start the catalog stack and run a DAG for testing

Frontend

  • #480: Refactor recent searches to reduce code duplication
  • #957: Better accessible name for Filters button and tab
  • #3195: Looped in the recent searches when browsing with keyboard

Ingestion Server

  • #4381: Drop `ORDER BY` clause from copy step of image data refresh when adding a limit

Management

  • #2068: Make linting more contributor-friendly
  • #4137: Create a "dev dependencies check" script for identifying what a contributor may need
  • #4327: Update the PR Review Reminder DAG with special timing for non-maintainers
  • #4404: Replace single usage of perl with Python
  • #4407: Add support for aliases in `ov`

openverse-infrastructure

Merged PRs

Management

  • #913: 🔄 synced file(s) with WordPress/openverse

Closed issues

API

  • #895: Incorrect log line separator definition for API

Documentation

  • #786: Update references to openverse.engineering domains in our public and internal documentation to use openverse.org instead

Infra

  • #895: Incorrect log line separator definition for API

#openverse, #week-in-openverse

A week in Openverse: 2024-05-20 – 2024-05-27

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4238: Change search query approach to include only available providers
  • #4334: Add 'revoked' field to ThrottledApplication to enable easily revoking access to client applications violating openverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org. TOS
  • #4362: Publish changelog for api-2024.05.20.15.14.53
  • #4377: Publish changelog for api-2024.05.23.15.02.00
  • #4380: Remove overridden function that doesn't do anything over super

Catalog

  • #4297: Set up airflow variable defaults with descriptions automatically
  • #4345: Fix SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. message formatting for ES health alert
  • #4357: Convert longer media `varchar` fields to `text` in the catalog db
  • #4369: Use `.venv` for catalog virtualenv instead of `venv`
  • #4378: Update dependency apacheApache Apache is the most widely used web server software. Developed and maintained by Apache Software Foundation. Apache is an Open Source software available for free.-airflow to v2.9.1 [SECURITY]

Documentation

  • #4302: Implementation Plan: Machine-generated tags on the frontend
  • #4326: Document retired node replacement in ES
  • #4383: Update link to openverse-attribution documentation

Frontend

  • #4313: Add frontend media documentation
  • #4361: Publish changelog for frontend-2024.05.20.15.14.53
  • #4363: Fix frontend to include languages that do not have iso-639-1 codes
  • #4368: Install caniuse-lite as a frontend dev dependency
  • #4375: Only set the user-agent headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. on the server

Ingestion Server

  • #4357: Convert longer media `varchar` fields to `text` in the catalog db
  • #4358: Add logs to cleaning steps in the ingestion server and skip saving tags
  • #4364: Publish changelog for ingestion_server-2024.05.20.19.47.22
  • #4365: Bump requests from 2.31.0 to 2.32.0 in /ingestion_server

Management

  • #4384: Bump requests from 2.31.0 to 2.32.2 in /automations/python

Closed issues

API

  • #673: Move audio thumbnail retrieval into grouped query
  • #688: Use domain in primary API docs README
  • #694: The mature filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. is not working
  • #736: Use alternate method for getting fast subset of rows
  • #739: Notifications when receiving content reports
  • #1055: Test issue to check the CI
  • #1232: Integrity error causes oauth registration view to 500
  • #4076: Exclude media from sources without `ContentProvider` record from search
  • #4321: Add ability to revoke access to specific Openverse API registered client applications

Catalog

  • #1436: Configure pools & priority weights
  • #4109: Use `.venv` for catalog virtualenv
  • #4202: Set up Airflow Variable defaults with descriptions automatically
  • #4312: Convert longer media `varchar` fields to `text` in the catalog database

Documentation

  • #4039: Implementation Plan: Determine and design how machine-generated tags will be displayed/conveyed in the Frontend

Frontend

  • #2766: Set UA string for frontend API requests server-side only
  • #2904: Refused to set unsafe header "User-Agent"
  • #4025: Write TSDoc to document frontend fields
  • #4367: Browserlist (caniuse-lite) DB needs updating on the frontend

openverse-infrastructure

Merged PRs

API

  • #894: Improve support for initializing ES nodes in the userdata script and ansible playbook

Documentation

  • #912: Update contact information for Europeana

Infra

  • #884: Convert Kibana to `immutable-ec2-service`
  • #891: Use non-inference based container definition sensitivity filtering
  • #893: Remove dangling references to airflow.openverse.engineering
  • #905: Challenge repeat 401/403 requesters
  • #906: Fix immutable ec2 service deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. workflow expression usage
  • #907: Include user's SSHSSH Secure SHell - a protocol for securely connecting to a remote system in addition to or in place of a password. configuration file

Ingestion Server

  • #908: Rollback `prod` ingestion server, bump `dev`, re-enable data refresh limit and set `CLEANUP_BUFFER_SIZE`

#openverse, #week-in-openverse

A week in Openverse: 2024-05-13 – 2024-05-20

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4310: Use explicit through table for media/decision many-to-many field
  • #4315: Convert longer media `varchar` fields to `text` in the API
  • #4316: Publish changelog for api-2024.05.13.15.19.42
  • #4322: Remove unnecessary write-time validating URLTextField
  • #4323: Use openverse.org domains for OpenverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org. API in all documentation
  • #4331: Force all API loggers to be structured
  • #4349: Prevent exposing Django Admin features referencing media tables in prod
  • #4351: Fix stray use of `console` logger to `console_structured`

Catalog

  • #4259: Add new data refresh factory
  • #4314: Remove temporary Science Museum DAG now that it is no longer necessary
  • #4353: Remove popularity & matview timeouts from data refresh configurations
  • #4359: Update bucket names to use openverse-catalog and remove openverse-storage

Documentation

  • #4310: Use explicit through table for media/decision many-to-many field
  • #4323: Use openverse.org domains for Openverse API in all documentation

Frontend

  • #4295: Initialize feature flag state on the server
  • #4301: Fix the aria label for search result grid
  • #4317: Publish changelog for frontend-2024.05.13.17.18.44
  • #4323: Use openverse.org domains for Openverse API in all documentation
  • #4336: Fix the skip-to-content button
  • #4338: Focus the trigger when content setting modal is closed

Ingestion Server

  • #4259: Add new data refresh factory

Closed issues

API

  • #4279: Max field length for catalog and API is inconsistent
  • #4286: `<Media>Decision` many-to-many table should reference the media's `identifier` column, not `id`
  • #4311: Convert longer media `varchar` fields to `text` in the API
  • #4320: Remove bespoke `URLTextField` in favor of base `TextField`
  • #4344: Prevent Django Admin default queries on primary media tables in production
  • #4347: `DJANGO_DB_LOGGING` setting breaks the build

Catalog

  • #4146: Create the new data refresh DAG factory and move initial steps into Airflow
  • #4279: Max field length for catalog and API is inconsistent
  • #4352: Remove popularity & matview timeouts from data refresh configurations
  • #4355: Remove uses of `openverse-storage` bucket

Frontend

  • #4223: Initialize the feature flag store state on server request
  • #4300: Incorrect search result aria-label
  • #4335: Skip-to-content button is broken
  • #4337: The content settings button should be focused when the modal is hidden

openverse-infrastructure

Merged PRs

API

  • #898: Bump production API task count by 50%

Frontend

  • #889: Update frontend image tag for nginxNGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers. https://www.nginx.com/. on the nuxt 3 preview

Infra

  • #873: Refactor sentry resource management to its own root module
  • #886: Bypass refresh for indexer worker deployments
  • #892: Fix target group and listener race condition for existing ECS services
  • #896: Remove API multiline log configuration
  • #898: Bump production API task count by 50%
  • #899: Remove unused data node heap JVM options file

Ingestion Server

  • #887: Bump ingestion server

Management

  • #888: 🔄 synced file(s) with WordPress/openverse

Closed issues

Catalog

  • #849: Create the Terraform and Ansible resources needed to deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. catalog indexer workers

Infra

  • #839: Move Sentry configuration into its own root module
  • #849: Create the Terraform and Ansible resources needed to deploy catalog indexer workers
  • #875: `generic/service` and `generic/domain-listener-rule` race condition
  • #901: Replace retiring ES production node
  • #902: Replace retiring ES production node

Ingestion Server

  • #849: Create the Terraform and Ansible resources needed to deploy catalog indexer workers

#openverse, #week-in-openverse

A week in Openverse: 2024-05-06 – 2024-05-13

openverse

Merged PRs

APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.

  • #4229: Add management command to send api move announcement email
  • #4249: Cache repeated thumbnail failures within configured TTL
  • #4250: Update `openverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org.-attribution` with new features and improvements
  • #4254: Alter Django Admin media views to surface content reports
  • #4263: Add structured logging for the API
  • #4267: Run checks for Python packages in CI
  • #4274: Publish changelog for api-2024.05.06.19.44.35
  • #4278: TagTag Tag is one of the pre-defined taxonomies in WordPress. Users can add tags to their WordPress posts along with categories. However, while a category may cover a broad range of topics, tags are smaller in scope and focused to specific topics. Think of them as keywords used for topics discussed in a particular post. API images with PDM content hash
  • #4280: Expose provider in the API tags response
  • #4281: Fix send email query
  • #4287: Publish changelog for api-2024.05.07.23.56.36
  • #4288: Remove sendapimoveannouncement management command
  • #4292: Ignore `django-structlog` middleware logs in Sentry

Catalog

  • #4260: Rename old data refresh to `legacy_data_refresh`
  • #4276: Update Science museum urls

Documentation

  • #4231: Initialize Provider store data, UIUI UI is an acronym for User Interface - the layout of the page the user interacts with. Think ‘how are they doing that’ and less about what they are doing. store and flags store from cookies in a pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party
  • #4235: Convert `.stories.mdx` to `stories.js` for compatibility with Storybook v.8
  • #4266: Migrate documentation to PDM
  • #4296: Fix links to frontend unhealthy host alarms and move PRs template

Frontend

  • #4231: Initialize Provider store data, UI store and flags store from cookies in a plugin
  • #4235: Convert `.stories.mdx` to `stories.js` for compatibility with Storybook v.8
  • #4265: Add storybook smoke test
  • #4272: Storybook test changes to prevent flakiness from `sleep`
  • #4283: Replace sample secret key for plausible with a more obviously safe value
  • #4284: Publish changelog for frontend-2024.05.07.16.44.22
  • #4294: Update pnpm
  • #4299: Update Node to v.20

Ingestion Server

  • #4307: Publish changelog for ingestion_server-2024.05.10.03.01.22

Management

  • #4264: Bump tqdm from 4.64.0 to 4.66.3 in /utilities/dead_links
  • #4267: Run checks for Python packages in CI
  • #4271: Bump jinja2 from 3.1.3 to 3.1.4 in /utilities/project_planning
  • #4277: Give up Pipenv cache to fix CI
  • #4285: Make API image smaller by not including dev dependencies

Closed issues

API

  • #1008: Reduce size of API production image
  • #3432: API structured logging
  • #3635: Create new content report Django admin table view
  • #3742: Write and run management command to send email notifying registered API users of the new `api.openverse.org` domain
  • #4024: Write docstrings to document API fields
  • #4167: Run CI tests for Python sub-packages
  • #4273: Expose provider information in the tags

Catalog

  • #1488: XCOM pull shorthand function
  • #3847: Add variable to disable removing SQL source files for ingestion workflows
  • #4261: Some Science Museum records continue to have invalid URLs

Documentation

  • #4038: Implementation Plan: Determine and design how machine-generated tags will be displayed/conveyed in the API
  • #4138: Move "Contributing to Openverse" docs section to "General development guidelines" headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes.

Frontend

  • #2219: Storybook smoke test does not catch if individual stories are broken
  • #4217: Replace the sample string in a secret with a more obviously safe value
  • #4230: Remove the `ssrRef` from provider store
  • #4234: Convert Storybook stories to `stories.js` CSF format

Ingestion Server

  • #1008: Reduce size of API production image

Management

  • #777: Collect documentation inside Sphinx
  • #3743: Write and publish Make post announcing switch to openverse.org for the API
  • #4167: Run CI tests for Python sub-packages

openverse-infrastructure

Merged PRs

Frontend

  • #877: Bypass caching on static frontend routes when cookies are present

Infra

  • #866: Add golden image Packer build
  • #871: Create indexer worker pools
  • #876: Point UptimeRobot to api.openverse.org; use .org as canonical API domain; redirect .engineering API requests when header is present
  • #881: Add TENCENT ASN to malicious list
  • #885: Fix nuxt-preview deployment dispatch type

Ingestion Server

  • #883: Grant permissions over catalog S3 bucket to the ingestion server

Management

  • #882: Add script for running API management command

Closed issues

API

  • #550: Thumbnail error responses not cached, even if we know they're going to fail

Infra

  • #585: Add workflow for running one-off ECS tasks
  • #780: Update UptimeRobot monitors to point to `api.openverse.org`
  • #843: Combine best of autoscaling group launch templates with Ansible

Management

  • #585: Add workflow for running one-off ECS tasks

#openverse, #week-in-openverse