Community Meeting Recap (2023-07-04)

[Meeting Start]

📝 Agenda

In this meeting we discussed the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including images and audio. Find Openverse on GitHub and at https://openverse.org. dataset, specifically in the context of sharing it with interested parties. We were joined by Aaron Gokaslan (MosiacML, Cornell Uni) and Apolinário (Hugging Face). The following discussion topics emerged.

  • Deduplication
    • Openverse doesn’t yet do any deduplication, we but have future plans to explore it.
    • We also discussed what metrics to consider when determining the canonical and best version of any media item and how aggregating disparate information for duplicates across platforms can improve relevance. [Slack]
  • Synthetic captions
    • MosaicML has an AI captioning pipeline to generate synthetic captions in addition to the ground truth.
    • Openverse could utilise these captions because metadata useful for ML applications, like generated labels, would also be useful to human audiences. [Slack]
  • Dataset updates
    • Openverse is constantly collecting new metadata and identifying new images.
    • In the HF Datasets format, new data could be just appended to the dataset, while updating would involve kind of re-doing it.
    • We discussed different approaches towards newly added records vs updates to existing records. [Slack].
  • Academic paper
    • There are no technical academic papers referencing Openverse, either published by us or anyone else. Madison did give a talk at PyData that was related to this.
    • We discussed that publishing a paper might be interesting and Madison’s slides could be a good jumping off point. [Slack]
  • Conclusions
    • We’re continuing the discussion on GitHub.
    • Contributions to the conversations are welcome.

🔔 Reminder

Openverse contributors will host a sync video meeting to discuss priorities for July at 1500 UTC on July 5th 2023, links for which will be posted in the #openverse channel of the Making WordPress Chat.

[Meeting End]