A week in Openverse: 2023-03-27 – 2023-04-03

openverse

Merged PRs

  • #1113: Bump ipython from 8.11.0 to 8.12.0 in /api
  • #1104: Pass actor for staging deploys with the `-f` flag
  • #1103: Add `GITHUB_TOKEN` to GitHubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ CLICLI Command Line Interface. Terminal (Bash) in Mac, Command Prompt in Windows, or WP-CLI for WordPress. step
  • #1098: Update other references of media count to 700 million
  • #1067: Fix typo in docs building on `main`
  • #1065: Restore Django Admin views
  • #1063: Use label.yml to determine required labels
  • #1058: Fix issues in the workflow simplifications of #1054
  • #1054: Simplify CI + CD workflow
  • #1051: Pin pnpm version in frontend `Dockerfile`
  • #1048: Bump boto3 from 1.26.84 to 1.26.100 in /ingestion_server
  • #1047: New issue automation fix: convert the jobs into steps to share env variables
  • #1044: Bump bottle from 0.12.24 to 0.12.25 in /ingestion_server
  • #1042: Bump boto3 from 1.26.97 to 1.26.99 in /api
  • #1041: Bump filelock from 3.9.0 to 3.10.7 in /ingestion_server
  • #1040: Bump pytest-order from 1.0.1 to 1.1.0 in /ingestion_server
  • #1039: Bump aws-actions/configure-aws-credentials from 1 to 2
  • #1038: Use `ACCESS_TOKEN` for the Project automation
  • #1034: Dispatch workflows instead of regular reuse to show deployment runs
  • #1031: Use the `issue.node_id` for GraphQL APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.
  • #1029: Defer the `tags_list` for media models
  • #1028: Absorb `actionlint` into pre-commit
  • #1026: Add stack label to dependabot & sync label PRs
  • #1007: Fix Re-running failed Playwright tests leaves failure comment
  • #990: Retry `up` recipe in case port is occupied
  • #904: Save cleaned up data during the cleanup step

Closed issues

  • #1099: Can't run linting locally
  • #1064: SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. notification when CI/CD fails on main
  • #1061: httpHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. in api response
  • #1033: Deployment workflow runs do not show in workflow run history
  • #999: Links in CONTRIBUTING.md are not working
  • #950: Configure TS to treat project as composite
  • #902: Phylopic images are broken
  • #896: Absorb `actionlint` into pre-commit
  • #884: Use label.yml to determine required labels
  • #864: Re-running failed Playwright tests leaves failure comment
  • #861: Add additional logging to the cleanup process
  • #654: Add a list of domains that are known to support TLS to the cleanup step
  • #337: Configure isort to include source paths
  • #734: Incorrectly formatted OAuth requests cause uncaught errors
  • #634: Investigate alternatives to Google Analytics

openverse-catalog

Merged PRs

  • #1074: Create DAG to fix PhyloPic's `foreign_identifier` column
  • #1072: Offset iNaturalist DAG from monthly by one day
  • #1071: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org.
  • #1070: Update pgcli version to 3.5.0
  • #1069: 🔄 synced file(s) with WordPress/openverse
  • #1068: Update Freesound to quarterly, extend timeout
  • #1067: 🔄 synced file(s) with WordPress/openverse
  • #1064: Bump apacheApache Apache is the most widely used web server software. Developed and maintained by Apache Software Foundation. Apache is an Open Source software available for free.-airflow[amazon,http,postgres] from 2.5.1 to 2.5.2
  • #1060: Update PhyloPic DAG to use API v2
  • #1058: Log last query_params hit before AirflowTaskTimeout
  • #1054: Add isort configuration file
  • #1047: Update Flickr large batch handling

Closed issues

  • #1073: Update PhyloPic's `foreign_identifier` field
  • #1025: Delay iNaturalist from `@monthly`
  • #998: Update Phylopic to use v2 API

openverse-infrastructure

Merged PRs

  • #449: Update actor input for staging
  • #447: Accept actor as input for dispatch and call
  • #441: Add SENTRY_DSN to ECS API
  • #435: Update env vars to fix URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org scheme for related endpoints
  • #433: Bump catalog-airflow to v1.5.2
  • #432: 🔄 synced file(s) with WordPress/openverse
  • #431: Point staging frontend to staging API
  • #430: 🔄 synced file(s) with WordPress/openverse
  • #429: Add documentation for running the staging data refresh
  • #427: Remove unnecessary branch checks
  • #423: Remove openverse-api modules from legacy environments

Closed issues

  • #439: Add Sentry DSN to API ECS configuration
  • #393: Decommission legacy API module
  • #390: Decommission `api-dev.openverse.engineering`
  • #317: Update the list of modules

#openverse, #week-in-openverse

A week in Openverse: 2023-03-20 – 2023-03-27

openverse

Merged PRs

  • #1014: Pass ISSUE_ID and PROJECT_ID to the new_issue workflow
  • #1011: Add release-drafter APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. configuration to enable testing in #987
  • #1006: Add CNAME in other use of `actions-gh-pages`
  • #1005: Add docs CNAME to config
  • #1000: Fix diagrams with transparent background in README.md of "ingestion_server" for dark mode.
  • #994: Fix local build of the API and add its `recreate` just command
  • #991: Update URLs to point to docs.openverse.org
  • #987: Add tag app release action
  • #986: Remove XML from the API
  • #984: Add GH_TOKEN to the gh steps
  • #981: Switch to internal headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. on single results
  • #980: Change post status to 'publish'
  • #979: Add decoding of the strings that don't have backslashes
  • #977: Skip build and publish job if nothing to do
  • #975: Fix needs check on api staging deployment
  • #974: Improve documentation for partial stack setups
  • #973: Use formless default browsable API renderer
  • #969: make init script more system agnostic
  • #967: Update general setup guide for macOS
  • #965: Configure ingestion_server as a known first party for isort
  • #963: Swap from flake8 to ruff
  • #962: Update to new link for pipenv install instructions
  • #961: Update opensearch.xml to fix bad url
  • #959: Fix link to dev flow docs
  • #955: Make `searchTerm` for VAudioTrack and VImageCell optional
  • #952: Update Nuxt to v.2.16.3
  • #951: Update TS configuration to use composite projects and fix VSCode integration
  • #945: Log DB queries in production
  • #944: Absorb `build-nginxNGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers. https://www.nginx.com/.` job into `build-images` job
  • #943: Make Plausible setup idempotent

Closed issues

  • #1012: new_issue workflow is failing to add the issue to the new project
  • #993: Can't spin up the API locally
  • #983: The workflow for new project automation needs a GitHubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ token
  • #982: Remove XML support from Django API
  • #976: Cutting a release does not successfully run CI/CD workflow
  • #972: Dump Django URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org resolver configuration and confirm all routes are expected API routes
  • #971: Staging API does not automatically deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. after merge to main
  • #970: Reporting HTMLHTML HTML is an acronym for Hyper Text Markup Language. It is a markup language that is used in the development of web pages and websites. view `SELECT`s all media records
  • #968: [Improvement] Diagrams with transparent background are not great in dark mode
  • #966: General setup guide requires Homebrew, but has no info on installation
  • #960: timeout is required to successfully create the elastic search indexes using the just file
  • #958: Tags incorrectly escaped utf-8 characters to `uxxxx`
  • #953: Make `searchTerm` non-required for Audio track and Image cell
  • #942: Plausible DB setup is not idempotent
  • #923: Search on single result page does not work intuitively
  • #900: Dead thingiverse images are not filtered out because they return 403
  • #899: Include `collectstatic` step inside API Dockerfile
  • #892: Document how to setup just parts of the stack
  • #868: Add issues to the new Project and set Priority field value
  • #866: Use profiles in Docker Compose
  • #859: Consider JSON5 for `package.jsonJSON JSON, or JavaScript Object Notation, is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML.` files
  • #388: API ECS MigrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies.
  • #463: Single result page should use header with navigation links
  • #478: Optimize CI pipeline avoiding running jobs for unrelated changes
  • #482: FilterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. counter in button and tab
  • #675: Use `thumbnail_url` for thumbnail generation when present
  • #755: Build UIUI UI is an acronym for User Interface - the layout of the page the user interacts with. Think ‘how are they doing that’ and less about what they are doing. for API consumers to get their key and check their usage (original #335)

openverse-catalog

Merged PRs

  • #1065: Bump pre-commit from 3.1.1 to 3.2.0
  • #1063: Add required stack label to dependabot PRs
  • #1057: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org.
  • #1052: Update README.md with documentation reference
  • #1049: Handle the upper case licenses in the add_license_dag
  • #1048: Remove watermarked setting for SMK
  • #1040: Add SuggestedSubProvider type
  • #1011: Add option to skip specific ingestion errors

Closed issues

  • #1020: Revert wartermark value change for SMK after files are deleted
  • #702: Add configuration to skip specific ingestion errors
  • #394: Add old repo to documentation

openverse-infrastructure

Merged PRs

  • #428: 🔄 synced file(s) with WordPress/openverse
  • #426: Add monorepo required checks
  • #424: 🔄 synced file(s) with WordPress/openverse
  • #422: Reduce log levels in the API and disable DB query logging
  • #419: Send alerts to alerts channel
  • #405: Upgrade cloudflare ssl mode to strict for all managed zones
  • #403: Update ES node metadata & Cloudwatch dashboard for newly provisioned node

Closed issues

  • #425: Update required checks for the monorepo

#openverse, #week-in-openverse

A week in Openverse: 2023-03-13 – 2023-03-20

openverse

Merged PRs

  • #939: Add console_prod handler to query logging to allow in production
  • #936: Always build both api & ingestion server images for either service
  • #935: Deregister media model admins and dependents
  • #934: Add Django DB logging option
  • #933: Add application name to DB
  • #931: Remove Docker image loading from docs steps
  • #930: Fix links on the main Storybook page
  • #927: Fix global audio player's close button
  • #925: Build `api` when ingestion server changes
  • #922: Add `.githubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/` to CODEOWNERS
  • #918: Fix global audio player layout
  • #917: Update pinia and pinia/testing
  • #916: Update VueVue Vue (pronounced /vjuː/, like view) is a progressive framework for building user interfaces. https://vuejs.org/. from 2.7.10 to 2.7.14
  • #915: Fix background color on report pages
  • #910: Add user validation, concurrency, manual runs to deployment workflow
  • #909: Add get-image-tag as dependency for nginxNGINX NGINX is open source software for web serving, reverse proxying, caching, load balancing, media streaming, and more. It started out as a web server designed for maximum performance and stability. In addition to its HTTP server capabilities, NGINX can also function as a proxy server for email (IMAP, POP3, and SMTP) and a reverse proxy and load balancer for HTTP, TCP, and UDP servers. https://www.nginx.com/. build step
  • #895: Skip more jobs based on changed files
  • #894: Simplify and fix bundle size workflow
  • #893: Only generate POT file if `en.json5` has changed
  • #891: Add ability to boost search results by authority
  • #889: Prepare Docker setup for monorepo
  • #888: Adding brand assets
  • #886: Split deployment workflow into 4 separate workflows
  • #882: Only run stack label addition step on pull requests
  • #873: Project Proposal: Detecting, filtering, and blurring results that include sensitive terms
  • #844: Implement analytics in Nuxt
  • #828: Move peerDependencyRules to root package.jsonJSON JSON, or JavaScript Object Notation, is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML.
  • #397: Provider tally extraction script

Closed issues

  • #929: The links in Storybook have not been updated to monorepo
  • #928: Frontend PRs fail CI
  • #926: Global audio player cannot be closed when the audio is playing
  • #921: Action Required: Fix Renovate Configuration
  • #920: Django check in CI is flakey because of plausible check
  • #913: Global audio player is broken
  • #908: `SEMANTIC_VERSION` is not supplied to nginx image
  • #906: Port conflict with SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/.
  • #879: Yellow background when reporting an image from GutenbergGutenberg The Gutenberg project is the new Editor Interface for WordPress. The editor improves the process and experience of creating new content, making writing rich content much simpler. It uses ‘blocks’ to add richness rather than shortcodes, custom HTML etc. https://wordpress.org/gutenberg/
  • #878: Update reverse proxy to allow for path prefix rewriting on the APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.
  • #877: Refactor deployment workflow into separate workflows per app and environment
  • #871: Jamendo thumbnails are failing
  • #865: Move Docker-only directories from root to `docker/`
  • #849: Skip frontend docker image build and its tests on non-frontend code changes
  • #827: Move pnpm peerDependencyRules.allowedVersions to the root package.json
  • #825: Set up wrangling for events
  • #380: Initial analysis of Redis provider tallies pre & post iNaturalist ingestion
  • #689: Add additional logging around search_controller's ES query building

openverse-catalog

Merged PRs

  • #1051: Adjust schedule for long running queries termination
  • #1050: Add DAG for terminating long-running queries
  • #1045: Use Python to group items by license to speed up the query
  • #1003: Remove alternate image extraction from SMK, fix foreign landing URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org

Closed issues

  • #1044: `add_license_url` DAG is inefficient and fails due to timeout
  • #1043: The Noun Project
  • #1039: Allow Flickr backfill to complete, turn notifications back on
  • #875: Duplicates identified in SMK data
  • #826: Provider: The Noun Project

openverse-infrastructure

Merged PRs

  • #420: secure staging api admin
  • #418: Add db logging and debug log level to production api
  • #417: Add api-production subdomain to access
  • #415: Add user validation, concurrency, manual runs to deployment workflow
  • #414: Add existing API aliases to ECS deployment
  • #413: Restore frontend capacity
  • #412: Add separate deployment workflows per environment/service
  • #411: Add photon auth key to ECS deployment
  • #401: Make desired count configurable, set to 5 in production

Closed issues

  • #399: Increase API ECS service count to match current EC2 production
  • #392: Point `api.openverse.engineering` to `api-production.openverse.engineering`
  • #366: Move staging ECS API to staging.openverse.org/api path route instead of openverse.engineering subdomain.

#openverse, #week-in-openverse

A week in Openverse: 2023-03-06 – 2023-03-13

openverse

Merged PRs

  • #890: Add a stemming override for the word “universe”
  • #885: Add stack to label sync, allow emoji to be defined for whole group
  • #872: Make deployment action “uses” explicit
  • #870: Update sentry; fix config
  • #863: Fix weekly update workflow
  • #862: Add feature flag for fake marking results as sensitive
  • #858: Remove `prepare` script to prevent i18n overwrites inside Docker
  • #851: Make codeowners more specific
  • #848: Identify and fix cause of cURL error 23 when setting up pre-commit
  • #846: Bump boto3 from 1.26.81 to 1.26.84 in /ingestion_server
  • #843: Add preferences for analytics
  • #842: Update homepage copy to “700 million”
  • #841: Bump boto3 from 1.26.81 to 1.26.84 in /api
  • #840: Add production APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. deployment action
  • #838: Bump elasticsearch-dsl from 7.4.0 to 7.4.1 in /api
  • #836: Bump python-decouple from 3.7 to 3.8 in /api
  • #835: Bump python-decouple from 3.7 to 3.8 in /ingestion_server
  • #832: Bump elasticsearch-dsl from 7.4.0 to 7.4.1 in /ingestion_server
  • #831: Bump pytest from 7.2.1 to 7.2.2 in /ingestion_server
  • #830: Bump renovatebot/githubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/-action from 34.152.5 to 34.154.4
  • #806: Fix crash when more than one `q` parameter is provided in URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org
  • #804: RFC + POC: Add Plausible for analytics
  • #798: Handle incorrect types in cookie value
  • #788: Update home link screen reader text
  • #786: Add stack label if available, make get-changes composite action
  • #785: Add actions to search forms

Closed issues

  • #871: Jamendo thumbnails are failing
  • #857: Locales missing in Docker images
  • #854: Add non-production feature flag for marking half of results as sensitive
  • #852: `TypeError` term.trim is not a function
  • #850: PR review requests are not following the CODEOWNERS assignements
  • #845: Handle `precommit` recipe exiting with code 23
  • #839: Update homepage copy to “700 million”
  • #821: Add feature flag for analytics
  • #782: Invalid cookie value causes an error
  • #781: `setSearchTerm` fails when `query.q` is an array
  • #760: Update “Week in OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org.” script to support monorepo
  • #521: `DataCloneError` raised on search in Safari
  • #522: Switch to `ENTRYPOINT` instead of `CMD` in our Dockerfile
  • #545: Dependency Dashboard

openverse-catalog

Merged PRs

  • #1042: Update `LICENSE` to match main repo
  • #1041: Tweak Flickr time division settings, add logs
  • #1038: Add trailing slash to Jamendo thumbnail URLs
  • #1037: 🔄 synced file(s) with WordPress/openverse
  • #1036: Temporarily turn off scheduled image data refreshes, increase matview refresh timeout
  • #1035: Add logging to iNaturalist date check
  • #1034: Add flickr sub provider auditing dag
  • #1031: Adjust Flickr max records to account for incorrect reporting
  • #1028: Improve license URL validation
  • #1005: Add a DAG for backfilling license_url when meta_data is null
  • #976: Add Airflow variable used to configure overrides for task timeouts

Closed issues

  • #1027: `_get_valid_cc_url` makes a network request even for known valid license urls
  • #1024: Improve iNaturalist date check logging
  • #724: Allow execution timeouts to be overridden by Variables
  • #676: Identify new Flickr sub-providers
  • #511: Ensure that all media have `license_url` in `meta_data` field

openverse-infrastructure

Merged PRs

  • #410: 🔄 synced file(s) with WordPress/openverse
  • #408: Increase frontend memory and CPU back up
  • #406: Bump API to v4.0.0, point bump script to monorepo
  • #402: Add API service to ECS Cloudwatch dashboards
  • #398: Stand up production API on ECS
  • #397: Construct API URLs dynamically, change staging domain
  • #386: Add stack check as required for monorepo
  • #381: Set frontend memory and cpu to match staging

Closed issues

  • #407: Update deployment action to generate a blockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience. per service
  • #400: Set up API in Cloudwatch ECS dashboard
  • #391: Set up and deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. production API on ECS

#openverse, #week-in-openverse

A week in Openverse: 2023-02-13 – 2023-02-20

openverse

Merged PRs

  • #418: Update sync to account changes to PR template path
  • #407: Change ‘Reverted’ to ‘Rollback’ in project docs
  • #406: Remove front matter from project proposal template
  • #405: Fix authenticated logins for the label PR action
  • #402: Update actions/checkout action to v3
  • #401: Restore the functionality of the weekly Make post
  • #398: Bump ipython from 8.3.0 to 8.10.0 in /automations/python
  • #396: Fix project automation logic around closed PRs
  • #366: Proposal: OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. project process

Closed issues

  • #414: Remove the old headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. code
  • #413: Add option to sort search results by created_on
  • #276: Omit issues with closed PRs of moving to the “In Progress” column in the project board

openverse-catalog

Merged PRs

  • #994: Add an “Airflow Alert” issue template
  • #969: Add dayshift to tsv filenames for reingestion workflows

Closed issues

  • #1000: Jobe Alert
  • #997:
  • #768: Load_data steps for `image` skipped during Wikimedia reingestion
  • #766: Update to new version of Phylopic APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.
  • #689: Investigate converting iNaturalist to an incremental DAG
  • #684: inaturalist data quality: issue warning with missing photo ids

openverse-api

Merged PRs

  • #1144: Add screen to API Docker image
  • #1143: Bump django from 4.1.6 to 4.1.7 in /api
  • #1142: Add API rollback workflow
  • #1141: 🔄 synced file(s) with WordPress/openverse
  • #1140: Bump ipython from 8.9.0 to 8.10.0 in /api
  • #1139: Bump ipython from 8.9.0 to 8.10.0 in /ingestion_server
  • #916: Add option to sort search results by `created_on`

openverse-frontend

Merged PRs

  • #2192: Simplify `get-translations.js` and add error handling and fallbacks
  • #2191: Add a directive for translators to not translate Netherlands
  • #2190: 🔄 synced file(s) with WordPress/openverse
  • #2188: Download translations in bulk to prevent GlotPress throttling
  • #2187: Use aria-label for WordPress affiliation link
  • #2185: Move the sidebarSidebar A sidebar in WordPress is referred to a widget-ready area used by WordPress themes to display information that is not a part of the main content. It is not always a vertical column on the side. It can be a horizontal rectangle below or above the content area, footer, header, or any where in the theme. in the DOM order
  • #2184: Reduce GlotPress limit further to ensure all languages
  • #2180: Add “skip to content” links to the homepage and the 404 page; fix footer role
  • #2178: Add “skip to content link” to the Single result pages
  • #2177: Use h1 for the main heading on the homepage
  • #2172: Make translations more reliably present in all environments
  • #2169: Update common’s size estimate to 2.5 billion
  • #2162: Remove the unused old header code
  • #2146: Remove the searchBy creator filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. from the filters list
  • #2134: Fix the Search Help (Syntax Guide) links

Closed issues

  • #2189: Add translator note for Dutch translation of `search-guide.example.prefix.content`
  • #2183: Improve accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) of the WordPress link in the footer
  • #2182: Expanding the filters sidebar does not focus a screen reader on the filters section
  • #2179: Default layout should not nest `footer` inside `main`
  • #2176: The first heading on the home page should be `h1`
  • #2170: Not all locales show up in picker
  • #2163: Remove the `old_header` code
  • #2125: Keyboard navigation to the footer on the search page is impossible
  • #2116: Syntax Guide page links don’t work
  • #1344: Creator filter is unclear

openverse-infrastructure

Merged PRs

  • #379: DeployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. API 2.7.6
  • #376: Include Make-related secrets in Terraform
  • #375: Update API email address to openverse.org
  • #373: Update @dhruvkb‘s SSHSSH Secure SHell - a protocol for securely connecting to a remote system in addition to or in place of a password. key in `globally_authorized_keys`
  • #372: Sync config with actual infra
  • #371: Include deployment secrets in `WordPress/openverse`
  • #370: Add note about nuxt memory usage after deploy
  • #357: Add CloudWatch agent to API boxes

Closed issues

  • #338: Update API email address to use openverse.org

#openverse, #week-in-openverse

A week in Openverse: 2023-02-08 – 2023-02-15

openverse

Merged PRs

  • #398: Bump ipython from 8.3.0 to 8.10.0 in /automations/python
  • #395: Change authentication token
  • #376: Match frontend linting dependency versions to frontend/pull/2121
  • #369: chore(deps): update alex-page/githubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/-project-automation-plus action to v0.8.3

Closed issues

  • #391: PR Project Automation is failing
  • #389:
  • #387: Baseline SEO improvements
  • #386: iFrameiframe iFrame is an acronym for an inline frame. An iFrame is used inside a webpage to load another HTML document and render it. This HTML document may also contain JavaScript and/or CSS which is loaded at the time when iframe tag is parsed by the user’s browser. removal

openverse-catalog

Merged PRs

  • #993: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org.
  • #974: Update Europeana endpoint
  • #969: Add dayshift to tsv filenames for reingestion workflows

Closed issues

  • #1000: Jobe Alert
  • #997:
  • #768: Load_data steps for `image` skipped during Wikimedia reingestion
  • #766: Update to new version of Phylopic APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.
  • #109: Update Europeana endpoint and accomodate v2 API changes

openverse-api

Merged PRs

  • #1140: Bump ipython from 8.9.0 to 8.10.0 in /api
  • #1139: Bump ipython from 8.9.0 to 8.10.0 in /ingestion_server
  • #1135: Bump cryptography from 39.0.0 to 39.0.1 in /api
  • #1082: Add zero-downtime deployments & data transformations guide

Closed issues

  • #1030: Add documentation describing the data migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. process we should follow

openverse-frontend

Merged PRs

  • #2184: Reduce GlotPress limit further to ensure all languages
  • #2177: Use h1 for the main heading on the homepage
  • #2172: Make translations more reliably present in all environments
  • #2169: Update common’s size estimate to 2.5 billion
  • #2166: Update URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org in opensearch.xml
  • #2165: Group localized URLs of the same page in the sitemap
  • #2161: Ensure the lineage is traced correctly
  • #2160: Fix homepage zooming on iPhone
  • #2158: Minify homepage images
  • #2157: Baseline SEO improvements
  • #2156: 🔄 synced file(s) with WordPress/openverse
  • #2155: Fix the headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. scrolling
  • #2154: Enable new header feature flag in production
  • #2149: Remove ring offset to reduce ring thickness
  • #2147: Tighten the condition for audio playback to continue across pages
  • #2144: Increase horizontal padding in mobile search grid
  • #2140: Update content switcher tabs fonts and icons to match the mockups
  • #2121: Update eslint- and prettier- related dependencies
  • #2120: Update babel-related dependencies
  • #2109: Extract Prometheus module
  • #2104: Add Debounce to filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. selection
  • #2101: Update search term when navigating with browser`s back and forward buttons
  • #2093: Serve Prometheus metrics on a separate port; fix metrics development workflow

Closed issues

  • #2176: The first heading on the home page should be `h1`
  • #2170: Not all locales show up in picker
  • #2164: Sitemap is not correct for i18n routes
  • #2159: Missing Translation Strings
  • #2151: Browser zoom-in the search input on mobile when typing
  • #2150: The header is not fixed when scrolling to bottom
  • #2133: Increase padding of search results grid on mobile size
  • #2129: Focus outline thickness is incorrect in the ‘Clear filters’ button
  • #2118: Audio keeps playing on a single image result
  • #2082: Add the search term as a query parameter to the single result page
  • #2010: Minimize the use of JS for layout in the VAudioTrack component
  • #2009: Menu and breakpoint improvement in new header
  • #1295: Homepage search type buttons for small width viewports run off screen in languages with longer labels
  • #810: Requests to invalid / non-existent resources should return a 404 HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. status
  • #811: Photos > Images should use a server-side, not client-side redirect
  • #812: All pages should output a canonical URL tag
  • #813: Add hreflang directives
  • #505: Debounce filter selection

openverse-infrastructure

Merged PRs

  • #373: Update @dhruvkb‘s SSHSSH Secure SHell - a protocol for securely connecting to a remote system in addition to or in place of a password. key in `globally_authorized_keys`
  • #372: Sync config with actual infra
  • #371: Include deployment secrets in `WordPress/openverse`
  • #370: Add note about nuxt memory usage after deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors.
  • #369: Bump API to v2.7.5
  • #368: Bump catalog-airflow from v1.5.0 to v1.5.1
  • #367: 🔄 synced file(s) with WordPress/openverse
  • #365: Split next root modules by service

Closed issues

  • #347: Split `next` root modules’ `main.tf` into separate files for each service

#openverse, #week-in-openverse

Applying ECS to the ingestion server/data refresh

This was a passing thought I had that I wanted to note somewhere. Currently the ingestion server is a small Falcon app that runs most aspects of the data refresh, but then also (in staging/prod) interacts with a fleet of “indexer worker” EC2 instances when performing the Postgres -> Elasticsearch indexing.

We have plans for moving the data refresh steps from the ingestion server into Airflow. Most of these steps are operations on the various databases, so they’re not very processor-intensive on the server end. However, the indexing steps are intensive, which is why they’re spread across 6 machines in production (and even then it can take a number of hours to complete).

We could replicate this process in Airflow by setting up Celery-based workers so that the tasks run on a separate instance from the webserver/scheduler. Ultimately I’d like to go this route (or use something like the ECS Executor rather than Celery), but that’s a non-trivial effort to complete.

One other way we could accomplish this would be to use ECS tasks! We could have a container defined specifically for the indexing step, which expects to receive the range on which to index and all necessary connection information. We could then kick off n of those jobs using the EcsRunTaskOperator, and wait for completion using the EcsTaskStateSensor to determine when they complete. This could be done in our current setup without any new Airflow infrastructure. It’d also allow us to remove the indexer workers, which currently sit idle (albeit in the stopped state) in EC2 until they are used.

#airflow, #data-refresh, #ecs, #infrastructure, #openverse

Community Meeting Recap (09 August 2022)

Meeting start

🎉 Done!

👀 Needs review

🚧 In progress

An issue is in the todo column and unassigned.

💬 Agenda discussion

One of our agenda items was already tackled in the previous week, so no discussion on it was necessary. We discussed what else is needed before we can close out & deploy the catalog v1.3.0 milestone.

@krysal also brought to folks’ attention that we still need to run a data refresh in order to confirm some issues we completed in the catalog are addressed downstream.

Meeting end

#openverse, #openverse-weekly-community-meeting

Openverse Prioritization Meeting 2022-08-10

All OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. contributors are invited to attend a new meeting to review our current projects and roadmap for the rest of the year. The first of these sessions will be held on August 10th 2022 at 1500 UTC. Visit the #openverse channel in the Make WP SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. to see a video chat link prior to the meeting start.

Here are some helpful documents for use during the meeting:

And some background documentation that may help facilitate conversation:

#planning, #prioritization, #roadmap

Mitigating out of terms API usage

Yesterday at 20:20 UTC, we released version 2.5.5 of our API! Along with a few dependency upgrades and DevEx improvements/fixes, this release also brings an important change regarding anonymous API requests. After v2.5.5, any media searches that are made without an APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. key cannot request more than 20 results per page.

This change was made in order to mitigate behavior we were seeing on the API which was adversely affecting performance for other users, our capacity to update the data that backs OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org., and our ability to deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. new changes to the API.

Our API Terms of Service state:

– A user must adhere to all rate limits, registration requirements, and comply with all requirements in the Openverse API documentation;

– A user must not scrape the content in the Openverse Catalog;

– A user must not use multiple machines to circumvent rate limits or otherwise take measures to bypass our technical or security measures;

– A user must not operate in a way that negatively affects other users of the API or impedes the WordPress Foundation’s ability to provide its services;

Background

Beginning around May 18th, we saw a significant increase in traffic.

Total requests made to api.openverse.engineering over the last 30 days

While the digital demographics (browser, user agent, OS, device type, etc.) were quite varied, one feature stuck out – these requests were all being made with the page_size=500 parameter.

Total requests made to api.openverse.engineering over the last 30 days using the page_size=500 parameter

Over the course of the last 30 days, these requests constituted almost 80% of our total traffic! While our application is designed to handle this many requests, it is not designed to handle each request querying for 500 results per page (the default page size is 20). As such, this had created significant strain on our Elasticsearch cluster and eventually caused disruptions in the API’s ability to serve results. The image below combines a few of our monitoring tools to show a general correlation between the page_size=500 requests and our Elasticsearch resource utilization.

Request count compared to Elasticsearch resource utilization

Even before this release, our application was set up to throttle individual, anonymous users to 1 request/second. These page_size=500 requests were coming from a myriad of different hosts; the initiator was able to circumvent the individual throttles by employing a large number of machines (also known as a botnet). These machines were also predominantly tied to a single data center and a single ASN, which led us to believe this was orchestrated by a single user.

This behavior was clearly in violation of our Terms of Service, since it was:

  1. Not using a registered API key for high-volume use
  2. Scraping data from Openverse
  3. Using multiple machines to circumvent the application throttles
  4. Consuming significant enough resources that it impacted other users of Openverse

Mitigation

As mentioned above, we deployedDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. a change which would now return a 401 Unauthorized for any anonymous requests to the API that included a page_size greater than the default of 20. Almost immediately after deployment, we saw this mitigation take effect when observing request behavior:

Screenshot of a Cloudflare analytics page. The graph in the center shows total requests with page_size=500, separated by status code over 6 hours. A consistent number of requests (split between 301 and 200) can be seen starting at 9:00 PST. At 13:00 PST, the number of 401 requests begins to overtake the number of 200 requests. After 13:15, the number of 200 requests drops to zero and all requests returned are 401s.
Total number of page_size=500 requests made over the course of 6 hours, separated by return status code

In the above graph, you can see where we deployed v2.5.5 (~13:00 PST) – the number of 200 OK responses decreased, and the number of 401 Unauthorized responses increased significantly! Eventually all of the page_size=500 requests were being rejected as unauthorized.

With this change, we were able to successfully mitigate the botnet and return our resource consumption to typical levels. This can be seen easily with a few Elasticsearch metrics:

Elasticsearch metrics over the last 12 hours

While the intention behind Openverse is to make openly licensed media easy to access, we don’t currently have the capacity to enable users to access the entire dataset at once. We do plan on exploring options for this in the future.

We’re pleased that this mitigation was successful, and we will continue to be vigilant in ensuring uninterrupted access to Openverse for our users!

#openverse, #infrastructure, #api