Openverse Monthly Priorities Meeting 2023-03-08

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. contributors will host a community meeting to discuss priorities for March at 1500 UTC on 2023-03-08.

A sync video chat link will be provided in the #openverse channel of the Making WordPress Chat. We hope to see you there.

You can read the ongoing notes document for these meetings here.

Openverse is now a monorepo

The OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. frontend and APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. repositories have been merged into the primary Openverse repository. This change is meant to ease the facilitation of projects across the Openverse stack and ultimately enable full-stack testing and local development.

The Catalog repository remains standalone, to be moved into the monorepo later this year, and our Infrastructure repo remains private, although we also intend to open sourceOpen Source Open Source denotes software for which the original source code is made freely available and may be redistributed and modified. Open Source **must be** delivered via a licensing model, see GPL. this codebase in the future.

Any questions or concerns can be left here or in the monorepo. We continue to update documentation and references repositories as this change rolls out.

Community Meeting Recap (21 February 2023)

[Slack: Meeting start]

This is the first of the Meeting Recap posts that use the new format. From now on, we will only summarize the discussion of the agenda items. To keep up with the progress of OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org., you can check out the “Week in Openverse” posts.

📒 Agenda

  • Over the weekend (February 18th-19th), we merged the APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. and the frontend into our main `WordPress/openverse` repo, making it a monorepo. We’re currently buffing out the rough edges, updating documentation, and refining our processes, but the monorepo is very functional.
  • Considering this, we will stop development on the WordPress/openverse-api and WordPress/openverse-frontend repos and continue all of it in WordPress/openverse.
  • Soon, we will be migrating any open issues and rebasing any open PRs, finally culminating in archiving the old API and frontend repos.

[Slack: Meeting end]

#openverse-weekly-community-meeting

Post-iNaturalist Data Refresh Status

We initiated our first data refresh after adding the entirety of the iNaturalist data this last week, ballooning the catalog from 590m records to ~712m. Unfortunately, the image_view matview refresh step (which normally takes 16 hours) went far longer than expected. Here’s a link to the graph view for the run for those with access.

The matview refresh task hit the timeout of 1 day, but didn’t immediately stop due to a known issue that Rebecca is working on addressing. Typically when this happens, the query completes successfully (albeit well after the intended timeout) but Airflow, having attempted to issue a task failure for the entire duration the task went over the timeout, marks the task as failed. This initiated a retry on the task. Once I noticed the retry attempt, I proceeded to pause the DAG and mark the task as failed. For the same reason as just mentioned, this would not actually halt the query, so I went and executed a pg_terminate_backend on the matview refresh. This took quite some time to actually execute as well.

I wanted to verify whether or not the query ran successfully the first time even if it went beyond its alloyed timeout. I initially tried the rapid count calculation on the view, but it returned ~589m records. I wanted to verify that this was not an issue with tuples failing to be updated on the view for the rapid count, so I ran a full SELECT COUNT(*) FROM image_view. Sadly, this also returned that amount:

deploy@localhost:openledger> select count(*) from image_view;
+-----------+
| count     |
|-----------|
| 589538189 |
+-----------+
SELECT 1
Time: 1623.020s (27 minutes 3 seconds), executed in: 1623.017s (27 minutes 3 seconds)

This meant that the matview refresh did not complete successfully even after 52 hours running, which is quite disheartening.

I think our best bet is to remove retries on this task and increase the timeout to account for the larger data. We cannot do the latter though until we have a sense of how long the query will actually take now. In order to assess that, I’ve opened up a screen session on the catalog EC2 instance and began the REFRESH MATERIALIZED VIEW CONCURRENTLY image_view command on it (the session is named matview-refresh). I’ve also set a reminder to check on this in about 55 hours. Once that’s able to complete successfully, we’ll have a better understanding of how to proceed.

Given that this is merely the first step of the data refresh, we can probably expect further complications in later steps as well 😅

#airflow #data-refresh

A week in Openverse: 2023-02-13 – 2023-02-20

openverse

Merged PRs

  • #418: Update sync to account changes to PR template path
  • #407: Change ‘Reverted’ to ‘Rollback’ in project docs
  • #406: Remove front matter from project proposal template
  • #405: Fix authenticated logins for the label PR action
  • #402: Update actions/checkout action to v3
  • #401: Restore the functionality of the weekly Make post
  • #398: Bump ipython from 8.3.0 to 8.10.0 in /automations/python
  • #396: Fix project automation logic around closed PRs
  • #366: Proposal: OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. project process

Closed issues

  • #414: Remove the old headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. code
  • #413: Add option to sort search results by created_on
  • #276: Omit issues with closed PRs of moving to the “In Progress” column in the project board

openverse-catalog

Merged PRs

  • #994: Add an “Airflow Alert” issue template
  • #969: Add dayshift to tsv filenames for reingestion workflows

Closed issues

  • #1000: Jobe Alert
  • #997:
  • #768: Load_data steps for `image` skipped during Wikimedia reingestion
  • #766: Update to new version of Phylopic APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.
  • #689: Investigate converting iNaturalist to an incremental DAG
  • #684: inaturalist data quality: issue warning with missing photo ids

openverse-api

Merged PRs

  • #1144: Add screen to API Docker image
  • #1143: Bump django from 4.1.6 to 4.1.7 in /api
  • #1142: Add API rollback workflow
  • #1141: 🔄 synced file(s) with WordPress/openverse
  • #1140: Bump ipython from 8.9.0 to 8.10.0 in /api
  • #1139: Bump ipython from 8.9.0 to 8.10.0 in /ingestion_server
  • #916: Add option to sort search results by `created_on`

openverse-frontend

Merged PRs

  • #2192: Simplify `get-translations.js` and add error handling and fallbacks
  • #2191: Add a directive for translators to not translate Netherlands
  • #2190: 🔄 synced file(s) with WordPress/openverse
  • #2188: Download translations in bulk to prevent GlotPress throttling
  • #2187: Use aria-label for WordPress affiliation link
  • #2185: Move the sidebarSidebar A sidebar in WordPress is referred to a widget-ready area used by WordPress themes to display information that is not a part of the main content. It is not always a vertical column on the side. It can be a horizontal rectangle below or above the content area, footer, header, or any where in the theme. in the DOM order
  • #2184: Reduce GlotPress limit further to ensure all languages
  • #2180: Add “skip to content” links to the homepage and the 404 page; fix footer role
  • #2178: Add “skip to content link” to the Single result pages
  • #2177: Use h1 for the main heading on the homepage
  • #2172: Make translations more reliably present in all environments
  • #2169: Update common’s size estimate to 2.5 billion
  • #2162: Remove the unused old header code
  • #2146: Remove the searchBy creator filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. from the filters list
  • #2134: Fix the Search Help (Syntax Guide) links

Closed issues

  • #2189: Add translator note for Dutch translation of `search-guide.example.prefix.content`
  • #2183: Improve accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) of the WordPress link in the footer
  • #2182: Expanding the filters sidebar does not focus a screen reader on the filters section
  • #2179: Default layout should not nest `footer` inside `main`
  • #2176: The first heading on the home page should be `h1`
  • #2170: Not all locales show up in picker
  • #2163: Remove the `old_header` code
  • #2125: Keyboard navigation to the footer on the search page is impossible
  • #2116: Syntax Guide page links don’t work
  • #1344: Creator filter is unclear

openverse-infrastructure

Merged PRs

  • #379: DeployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. API 2.7.6
  • #376: Include Make-related secrets in Terraform
  • #375: Update API email address to openverse.org
  • #373: Update @dhruvkb‘s SSHSSH Secure SHell - a protocol for securely connecting to a remote system in addition to or in place of a password. key in `globally_authorized_keys`
  • #372: Sync config with actual infra
  • #371: Include deployment secrets in `WordPress/openverse`
  • #370: Add note about nuxt memory usage after deploy
  • #357: Add CloudWatch agent to API boxes

Closed issues

  • #338: Update API email address to use openverse.org

#openverse, #week-in-openverse

A week in Openverse: 2023-02-08 – 2023-02-15

openverse

Merged PRs

  • #398: Bump ipython from 8.3.0 to 8.10.0 in /automations/python
  • #395: Change authentication token
  • #376: Match frontend linting dependency versions to frontend/pull/2121
  • #369: chore(deps): update alex-page/githubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/-project-automation-plus action to v0.8.3

Closed issues

  • #391: PR Project Automation is failing
  • #389:
  • #387: Baseline SEO improvements
  • #386: iFrameiframe iFrame is an acronym for an inline frame. An iFrame is used inside a webpage to load another HTML document and render it. This HTML document may also contain JavaScript and/or CSS which is loaded at the time when iframe tag is parsed by the user’s browser. removal

openverse-catalog

Merged PRs

  • #993: 🔄 synced file(s) with WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org.
  • #974: Update Europeana endpoint
  • #969: Add dayshift to tsv filenames for reingestion workflows

Closed issues

  • #1000: Jobe Alert
  • #997:
  • #768: Load_data steps for `image` skipped during Wikimedia reingestion
  • #766: Update to new version of Phylopic APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.
  • #109: Update Europeana endpoint and accomodate v2 API changes

openverse-api

Merged PRs

  • #1140: Bump ipython from 8.9.0 to 8.10.0 in /api
  • #1139: Bump ipython from 8.9.0 to 8.10.0 in /ingestion_server
  • #1135: Bump cryptography from 39.0.0 to 39.0.1 in /api
  • #1082: Add zero-downtime deployments & data transformations guide

Closed issues

  • #1030: Add documentation describing the data migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. process we should follow

openverse-frontend

Merged PRs

  • #2184: Reduce GlotPress limit further to ensure all languages
  • #2177: Use h1 for the main heading on the homepage
  • #2172: Make translations more reliably present in all environments
  • #2169: Update common’s size estimate to 2.5 billion
  • #2166: Update URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org in opensearch.xml
  • #2165: Group localized URLs of the same page in the sitemap
  • #2161: Ensure the lineage is traced correctly
  • #2160: Fix homepage zooming on iPhone
  • #2158: Minify homepage images
  • #2157: Baseline SEO improvements
  • #2156: 🔄 synced file(s) with WordPress/openverse
  • #2155: Fix the headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. scrolling
  • #2154: Enable new header feature flag in production
  • #2149: Remove ring offset to reduce ring thickness
  • #2147: Tighten the condition for audio playback to continue across pages
  • #2144: Increase horizontal padding in mobile search grid
  • #2140: Update content switcher tabs fonts and icons to match the mockups
  • #2121: Update eslint- and prettier- related dependencies
  • #2120: Update babel-related dependencies
  • #2109: Extract Prometheus module
  • #2104: Add Debounce to filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. selection
  • #2101: Update search term when navigating with browser`s back and forward buttons
  • #2093: Serve Prometheus metrics on a separate port; fix metrics development workflow

Closed issues

  • #2176: The first heading on the home page should be `h1`
  • #2170: Not all locales show up in picker
  • #2164: Sitemap is not correct for i18n routes
  • #2159: Missing Translation Strings
  • #2151: Browser zoom-in the search input on mobile when typing
  • #2150: The header is not fixed when scrolling to bottom
  • #2133: Increase padding of search results grid on mobile size
  • #2129: Focus outline thickness is incorrect in the ‘Clear filters’ button
  • #2118: Audio keeps playing on a single image result
  • #2082: Add the search term as a query parameter to the single result page
  • #2010: Minimize the use of JS for layout in the VAudioTrack component
  • #2009: Menu and breakpoint improvement in new header
  • #1295: Homepage search type buttons for small width viewports run off screen in languages with longer labels
  • #810: Requests to invalid / non-existent resources should return a 404 HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. status
  • #811: Photos > Images should use a server-side, not client-side redirect
  • #812: All pages should output a canonical URL tag
  • #813: Add hreflang directives
  • #505: Debounce filter selection

openverse-infrastructure

Merged PRs

  • #373: Update @dhruvkb‘s SSHSSH Secure SHell - a protocol for securely connecting to a remote system in addition to or in place of a password. key in `globally_authorized_keys`
  • #372: Sync config with actual infra
  • #371: Include deployment secrets in `WordPress/openverse`
  • #370: Add note about nuxt memory usage after deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors.
  • #369: Bump API to v2.7.5
  • #368: Bump catalog-airflow from v1.5.0 to v1.5.1
  • #367: 🔄 synced file(s) with WordPress/openverse
  • #365: Split next root modules by service

Closed issues

  • #347: Split `next` root modules’ `main.tf` into separate files for each service

#openverse, #week-in-openverse

Openverse.org is live

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. is pleased to have found a permanent home at https://openverse.org. The previous domain https://wordpress.org/openverse will redirect to the new standalone domain.

Openverse is a proud part of the WordPress project. We’re excited to continue to deliver improvements to the search engine and follow our path towards deeper integration with WP coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. in 2023.

X-post: Apply to Attend the 2023 Community Summit

X-comment from +make.wordpress.org/community: Comment on Apply to Attend the 2023 Community Summit

Community Meeting Recap (07 February 2023)

[Slack: Meeting start]

🎉 Done!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

[Slack: Meeting end]

#openverse-weekly-community-meeting

Openverse is Moving!

The OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. find Openverse at https://openverse.org. project is switching homes to a standalone domain: https://openverse.org. This move will be completed on February 8th, 2023.

Along with the move from wordpress.orgWordPress.org The community site where WordPress code is created and shared by the users. This is where you can download the source code for WordPress core, plugins and themes as well as the central location for community conversations and organization. https://wordpress.org//openverse to openverse.org, we are shipping improvements to the site’s homepage, headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes., and footer.

WordPress.org/openverse will redirect to the new domain and users will continue to be able to access Openverse from the WordPress.org header. Openverse remains an important addition to the WordPress project. For our contributors, Openverse’s staging site has already moved to staging.openverse.org as announced earlier this year.

We’re excited for the improvements to SEO and usability introduced by this change.