Openverse Monthly Priorities Meeting 2023-02-08

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. contributors will host a community meeting to discuss priorities for February/March at 1500 UTC on 2023-02-08.

A sync video chat link will be provided. We hope to see you there.

You can read the ongoing notes document for these meetings here.

Openverse.org is live

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. is pleased to have found a permanent home at https://openverse.org. The previous domain https://wordpress.org/openverse will redirect to the new standalone domain.

Openverse is a proud part of the WordPress project. We’re excited to continue to deliver improvements to the search engine and follow our path towards deeper integration with WP coreCore Core is the set of software required to run WordPress. The Core Development Team builds WordPress. in 2023.

X-post: Apply to Attend the 2023 Community Summit

X-comment from +make.wordpress.org/community: Comment on Apply to Attend the 2023 Community Summit

Community Meeting Recap (07 February 2023)

[Slack: Meeting start]

🎉 Done!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

[Slack: Meeting end]

#openverse-weekly-community-meeting

Openverse is Moving!

The OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. project is switching homes to a standalone domain: https://openverse.org. This move will be completed on February 8th, 2023.

Along with the move from wordpress.orgWordPress.org The community site where WordPress code is created and shared by the users. This is where you can download the source code for WordPress core, plugins and themes as well as the central location for community conversations and organization. https://wordpress.org//openverse to openverse.org, we are shipping improvements to the site’s homepage, headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes., and footer.

WordPress.org/openverse will redirect to the new domain and users will continue to be able to access Openverse from the WordPress.org header. Openverse remains an important addition to the WordPress project. For our contributors, Openverse’s staging site has already moved to staging.openverse.org as announced earlier this year.

We’re excited for the improvements to SEO and usability introduced by this change.

Community Meeting Recap (31 January 2023)

[Slack: Meeting start]

🎉 Done!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

[Slack: Meeting end]

#openverse-weekly-community-meeting

Preparing for iNaturalist

Today we were able to merge some massive and significant changes contributed by @beccawidom to the iNaturalist DAG! This PR includes a number of changes, namely:

  • The transformation steps have changed from “CSV -> Postgres -> TSV -> Postgres” now to “CSV -> Postgres -> Postgres”. This significantly reduces disk space, time, and processing overhead, and was a necessary change in order to process all of the iNaturalist data in a reasonable timeframe. It also serves as a proof-of-concept for future bulk data imports, since the transformation & data cleaning steps are happening entirely in SQL (an OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. first!).
  • Images are now connected with the Catalog of Life, which provides English vernacular names. This should help improve search relevancy over the current scientific names.

I want to take a moment to celebrate this huge accomplishment, and the tremendous effort @beccawidom poured into this effort. Thank you!


Now that this DAG is ready to be run once again, we’re faced with the impressive and daunting notion that we could, in a matter of days, increase the size of the image catalog by ~137 million (a roughly 23.3% increase in size). With that information, it’s important to consider the implications of including this data.

We have a weekly image data refresh process which transfers images from the catalog into our APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. for public use. Presently, this data refresh takes around 47 hours without the popularity recalculation and 60 hours with the popularity recalculation. If we are to assume these times are linear, we can expect those times to become 58 hours and 74 hours respectively. Since these are run weekly, this still gives us about 100 hours left in the week before we start having data refreshes queued while previous ones are running.

Here are some steps we can take to monitor the process:

  1. Take a manual database snapshot of the catalog prior to enabling the iNaturalist DAG.
  2. Enable the DAG shortly after the weekly data refresh has completed. This will allow iNaturalist to run without other significant database operations occurring.
  3. Disable the DAG after the run while we verify the following steps.
  4. Monitor the next scheduled image data refresh closely for significant aberrations in step duration.
  5. Make a number of searches after the data refresh is complete to see how results are affected. We can make a number of searches which we would expect to return iNaturalist data (e.g. cat, mushroom, alligator) and some we expect should not (e.g. computer, transistor, book).
  6. Re-enable the iNaturalist DAG.

One of our big-picture goals for 2023 is search relevancy, and a key piece required for making improvements in that area is understanding how our existing document scoring works. I’m not sure that we can predict how adding this much data will affect our result relevancy. In the case where we notice result relevancy is negatively impacted (e.g. unrelated queries are flooded with iNaturalist results), there are a few actions we can take to mitigate this:

  • Alter the weight of the provider in the API (@sarayourfriend had mentioned this as an option).
  • Set the authority boost of the provider in the ingestion server and reindex the images.
  • Disable the iNaturalist provider in the API.

We would like to do all we can to avoid the last option. I don’t presume that the iNaturalist data will require taking the above actions, but I wanted to outline them and open up space in case other folks have mitigation ideas.

We’re incredibly excited for the addition of this data!

#catalog #database

Openverse switches to Photon for thumbnail generation

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. has moved from a self-hosted Imaginary instance for thumbnail generation to using Photon. Photon is a fast and flexible open-source image service with powerful tools for cropping, resizing, and filtering images.

The hosted instance of Photon we’re connecting to is provided to all Jetpack-connected WordPress sites, or sites hosted on the WordPress.comWordPress.com An online implementation of WordPress code that lets you immediately access a new WordPress environment to publish your content. WordPress.com is a private company owned by Automattic that hosts the largest multisite in the world. This is arguably the best place to start blogging if you have never touched WordPress before. https://wordpress.com/ platform. As we’ve been granted permission to use this for the Openverse frontend and APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways., other instances of the Openverse API, should you choose to host one, would need to connect to a different endpoint to comply with Photon’s Terms of Service. The Openverse API is easily configurable to switch to a different thumbnail proxy endpoint.

Please let us know if you encounter any issues with thumbnails on wordpress.orgWordPress.org The community site where WordPress code is created and shared by the users. This is where you can download the source code for WordPress core, plugins and themes as well as the central location for community conversations and organization. https://wordpress.org//openverse or in our API responses. Any feedback or concerns can be shared as comments on this post or as a GitHub issue. Thank you!

Community Meeting Recap (10 January 2023)

🗓️ Note: There will be no meeting for the next two weeks. The normal schedule will resume on January 31st.

[Slack: Meeting start]

🎉 Done!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

  • There were no items to discuss.

[Slack: Meeting end]

#openverse-weekly-community-meeting

Openverse’s staging site has moved

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project.’s staging site, where contributors evaluate their recently-merged work and prepare new releases of Openverse, has moved from search-staging.openverse.engineering to staging.openverse.org, following the project’s acquisition of the openverse.org domain name at the end of last year. The new name is simpler to remember and to type.

For the foreseeable future, the old .engineering domain name will redirect to the new one. At this time this change only effects the staging frontend site. Any other domain name changes to Openverse services will be announced at a future date.

If you see any references to the old domain name in our documentation, or elsewhere on the web, please let us know.