This page will be moved to the handbook when the procedure is stable

Introduction

Having to type from scratch a subtitle file takes a lot of time, so let’s get some help from AWS Transcribe to give us a starting TTML (subtitle) file where the timing is already set and just the phrases/words need to be reviewed before final upload to WordPress.tv

The process in short

  1. Choose a video to subtitle on WordPress.tv and copy the link to the video (under Download > Med)
  2. Go to https://go.wptv.club/subtitles-and-transcripts/ and enter the details
  3. Check your email and wait for the subtitle file
  4. Review the TTML (subtitle) file
  5. Submit the subtitle file on WordPress.tv for approval

Current restrictions

  • Only en_US videos for now
  • Only .mp4
  • Privacy: email and name are not yet anonymized when the job is finished
  • The ‘WordPress vocabulary’ is still limited. The more commonly used words we add to our environment, the more accurate the speech-to-text will become.

Test for yourself

  • On https://WordPress.tv search for a video that you want to subtitle
  • In the sidebar, in the ‘Download’ part, find the ‘Medium .mp4’ video and copy that link somewhere (it will start with https://videos.files.wordpress.com/…)
  • Go to https://go.wptv.club/subtitles-and-transcripts/ and enter your name, email and the link from the previous item.
  • You will get an initial email that the request is received and, depending on how many other items are ongoing, you will get the ttml file back in 30 to 90 minutes.
  • Review the ttml file (see the next chapter)
  • Submit the ttml file for approval in the sidebarSidebar A sidebar in WordPress is referred to a widget-ready area used by WordPress themes to display information that is not a part of the main content. It is not always a vertical column on the side. It can be a horizontal rectangle below or above the content area, footer, header, or any where in the theme. of the video on WordPress.tv (‘Subtitle this video’)

How to review a subtitle TTML file

If you just open this file in a text editor and correct words, then the file will be perfect for immediate upload. But some warnings:

  • The text should contain valid HTMLHTML HTML is an acronym for Hyper Text Markup Language. It is a markup language that is used in the development of web pages and websites. characters (so not ‘<‘ but ‘&lt;’, a double quote should be ‘&quot;’, etc.)
  • The &ndash character is not allowed in TTML.
  • The ‘<br /> means that the long line will be split over 2 lines
  • If there is just one word on a line, you can add it on the previous line. Just don’t forget to delete the full line where that word was before.

How can you help us in improving?

  • Does a video fail the process? Please send us the link so we can try.
  • Do you try to upload the final subtitle and it fails? Please send us the file you have so we have a look.
  • Did you correct several times a word that is pretty common in WordPress Vocabulary? Then please send us those words so we add them to the list! At that next video you will run using this procedure, more words will be correct.