It was pointed out to me that I never mentioned it anywhere when I made this change last month, but the plugin search engine at http://wordpress.org/extend/plugins/ has been much improved. So now when you search for things like “buddypress”, you should get what you’re looking for on the first page of results more often.
It was a minor adjustment, so it didn’t occur to me to tell anybody. Sorry about that.
Valentinas 10:45 pm on December 22, 2010 Permalink
Well try to search for ad-minister. Other search strings to test: wp-e-commerce, wp-ad-manager, basically anything that has wp prefix will produce the same search results. And you know there’s plenty of plugins with that prefix.
Otto 10:58 pm on December 22, 2010 Permalink
The search engine treats dashes as search operators. Meaning foo-bar searches for items containg foo but NOT bar.
Leave out the dashes when searching, for now.
Jane Wells 5:03 am on December 23, 2010 Permalink
Maybe drop a text instruction to that effect on the plugins search page? People search for wp e-commerce all the time.
Otto 12:25 pm on December 23, 2010 Permalink
Jane: I was planning on just fixing it so it didn’t do that, actually. Nacin’s idea was the same one I was planning on implementing.
Otto 1:05 pm on December 23, 2010 Permalink
Done. Hyphens now get escaped properly for the search when they don’t have spaces around them. So “e-commerce” will actually search for that as a word instead of considering it to be a control character. You can still use a hyphen in front of a word for exclusion, as long as it doesn’t have some other word character in front of the hyphen.
Andrew Nacin 12:03 am on December 23, 2010 Permalink
We can probably make it convert hyphens into spaces if the hyphen isn’t preceded by a space. I’ll check it out.
Bill Erickson 12:49 am on December 23, 2010 Permalink
When I search for Contact Form 7, the plugin shows up on the second page, even though I search for its exact name. This has been bugging me for about a month
Otto 1:47 am on December 23, 2010 Permalink
Yep. But, that’s a pretty generic title too. I didn’t say it was perfect, just better.
Jane Wells 5:03 am on December 23, 2010 Permalink
Actually, I think it used to come up as one of the first results, so for that one it looks like a loss. There are a couple I can’t find by searching now. What was changed to the search/what’s it searching now? If there are things we can do to the plugin metadata or readme files or whatever to make them more findable with the new search, we can publicize it.
Otto 12:29 pm on December 23, 2010 Permalink
Basically, I added extra weighting to the title, gave the tags a bit more weight, and turned on the “extended” matching mode, which makes it actually use the weights. Before it was in a simple text-only keyword search mode, which didn’t use any relevance or closeness matching at all.
The thing is that “Contact Form” is a fairly generic name, and while it does give the title a lot of boosting, every single one of the results on the first two pages has “Contact Form” in its title.
Sergey Biryukov 3:25 pm on December 23, 2010 Permalink
Perhaps if there’s an exact match, it can go first?
Valentinas 10:37 pm on December 23, 2010 Permalink
Yes, I wanted suggest the same as Sergey, exact match should definitely be the first.
Otto 10:52 pm on December 23, 2010 Permalink
Well, that’s not the easiest thing to do in the world. The question is what do you define as an exact match?
The Sphinx search engine works based on phrase matching. Now, in the particular case of “Contact Form 7″, the 7 is pretty much being ignored because it’s too short. So we’re really talking about “Contact Form” here. Note that it doesn’t much matter even if we were talking about the full “Contact Form 7″ because several of the other plugins above it also have “Contact Form 7″ in their titles as well.
And that’s the trick. The whole thing is based on a weighting algorithim. I gave titles a lot of weight specifically in order to push title matches up to the top, but in this case, all the ones that show up in the top 20 results have “Contact Form” in their titles. Several of them have “Contact Form 7″ in their titles as well, because they’re addons to it or what have you. So how is the search really supposed to know that “Contact Form 7″ is more important than “Contact Form 7 Addon”? Sphinx doesn’t give extra weight to “whole” titles.
I also give a bit of extra weight to tags, which is probably why some of those are coming up a bit higher than others. The relevance scores are all pretty close right there.
Basically, doing exact whole-phrase matches in Sphinx is kind of a hacky PITA, involving adding extra data to the database using delimiters for before and after and such. I’d prefer to get the 90% solution where people are searching for keywords and titles and having it get close-enough rather than adding a ton of extra data just to cover that one particular case. Especially when the case involves a really generic title like “Contact Form”.
So Protip for code authors: Come up with a unique name to be listed higher in searches. This is not specific to our search engine.
Matt 10:57 pm on December 23, 2010 Permalink
We’re not really discussing matching, but ranking. I think it’s fair to, given the results that Sphinx returns, re-order based on a metric like popularity or bump exact match (search == title) to the top.
Otto 10:59 pm on December 23, 2010 Permalink
Popularity ordering is already there if somebody wants to use it: http://wordpress.org/extend/plugins/search.php?q=Contact+Form+7&sort=popular
Matt 11:01 pm on December 23, 2010 Permalink
I guess it’s odd to me that popularity and rating are not part of relevance — if you think outside of the strict-matching sense of relevance, they’re all really one and the same. Whichever provides the best results to the most people should just be the default.
Otto 11:11 pm on December 23, 2010 Permalink
Ordering by rating is already there too: http://wordpress.org/extend/plugins/search.php?q=Contact+Form+7&sort=top-rated
But I find the other side of this odd, really. It doesn’t make a whole lot of sense to me that popularity and rating have anything to do with relevance. Relevance is really more a measure of how well the plugin’s description/name/tags/whatever fits the keywords/phrase you’re searching for. In other words, relevance is about the plugin itself and your search query, not about user-generated meta data about the plugin like popularity or ratings.
Could we include rating and popularity as a metric? Probably, but I’d have to investigate Sphinx in more depth. Those fields are not stored in the Sphinx database right now for sure, so it can’t include them in the relevance matching algorithm.
Otto 11:14 pm on December 23, 2010 Permalink
Scratch that, they are in there (rating and downloads over the last week), so they can be used. I’ll do some testing, see if it makes any difference.
Matt 11:16 pm on December 23, 2010 Permalink
I don’t think most people think of relevance that way. The core bug is you type in the name of one of the top ten most used plugins in the world and it doesn’t come up in the first page.
Type “akismet” and Akismet is #5.
Type “stats” and you see a page with 2 results, and 40 pages. (The counting/paging bug mentioned.)
My goal is to not give people multiple ranking options for the plugin search, just to have one that always gives you the results you’re looking for.
Otto 11:19 pm on December 23, 2010 Permalink
Yes, but everything is a tradeoff. At what cost do we bump more popular plugins? The cost is that less popular, but possibly better or more useful, plugins get bumped down.
Moving exact whole-title matches to the top is one thing (technically tricky, actually, but doable), but counting popularity and rating in seems different to me.
Andy Skelton 11:25 pm on December 23, 2010 Permalink
Stats and Akismet are good test cases. Almost any plugin can legitimately use the word “stats” in its description and many could reference Akismet, but a title including such words indicates extreme relevance. Titles are always more relevant than descriptions.
Otto 11:27 pm on December 23, 2010 Permalink
This is the problem with generic phrases. “stats” has 591 results over 74 pages, and the first several dozen (at least) have the word “stats” in their titles.
I’m doing more testing/tweaking to it now.
Otto 11:29 pm on December 23, 2010 Permalink
Also, the title of the plugin is actually “WordPress.com Stats”, so an exact matching check wouldn’t help it out there for “stats” anyway. Slugs are not included in the search system at all.
Otto 12:13 am on December 24, 2010 Permalink
Still looking at how to bump an exact title match to the top (this requires bypassing the search engine results to make it work), but after looking at the weights, I can at least tell you some specifics of why you see the results you see.
Akismet is 5th because the other 4 above it include “Akismet” as a tag. So they get bumped for that.
WordPress.com Stats: Same problem. The ones above it have tags with “stats” in them.
Andy Skelton 1:19 am on December 24, 2010 Permalink
WordPress.com Stats has “stats” as a tag.
Valentinas 3:33 am on December 24, 2010 Permalink
Some thought on why akismet should be first:
Download count (maybe take into account how many copies were downloaded recently). So up to 10.000 downloads – 0%, 10k-100k: 0.1%, 100k-1m: 1%, 1M-10M: 10% and so on..
Vote count (probably the same as download count, just different numbers, 100 votes – 1%, 500 votes – 10% and so on)
Vote value (only take into account if has 50 or more votes): 1 star: -10%,2: -5%, 3: 0%, 4: +5%, 5L +10%.
compactability (also only take into account if 10 or more votes)….
and so on..
I think these properties are good way to tell if the plugin is good or not. So by combining them with relevance you should be able to get good results. Of course this kind of stuff requires fine tuning..
As Matt mentioned, you should test the search engine with top plugins. also do you have analytics running on plugins directory? would be a good idea to look at what keywords brings people to certain plugins and test with them.
Peter Westwood 7:53 am on December 24, 2010 Permalink
To me the preferences for sorting would be:
1. Exact Match in Title
2. Order by install count / rating – download count is meaningless to me as it is too easily affected by having 100s of plugins release versions for minor buglets
jb510 8:57 am on December 24, 2010 Permalink
Thinking a little outside the box, Otto is right each search category (rel, new, upd, pop, rating) is there, but what I really want is the ability to combine those types of search. I want the radio buttons to be check boxes so I can search for “relevance” AND “popularity”.
Also, I wonder if you couldn’t add the abilty to search for exact name matches by entering the plugin name in quotes, ie. “Contact Form 7″?
Valentinas 9:23 am on December 24, 2010 Permalink
jb510, that would be useful for advanced user, but what i’d like to have is something like Google (I think we can agree, that Google is an example of good search
). You don’t have any “popularity” or “relevance” or anything in Google, all you do is enter the name ( like before mentioned “contact form 7″) and you get the result. Google sells their search engine ( http://www.google.com/enterprise/search/mini.html ), but that may be not an option for WP, since it’s quite expensive (starts from $3000).
Other than that idea about ability to search for exact name sounds pretty good (maybe even with a fallback to regular search informing the user about that – “we couldn’t find exact match for your query, but here’s some similar results”).
Otto 11:11 pm on December 29, 2010 Permalink
Exact name/slug matches (or fairly close) now get bumped to the top of search results.
In doing this, I discovered a case where results can be duplicated as well. it’s been there a while, but nobody noticed. That will take more time to figure out how to clean up.
Otto 10:30 pm on January 4, 2011 Permalink
Duplicate search results removed. These were rare, but they’re gone now.
Rich Pedley 8:15 am on December 23, 2010 Permalink
either the number of search results is still incorrect, or we are missing page links.
Otto 12:38 pm on December 23, 2010 Permalink
The page links code was slightly off. It didn’t have the correct number set for its “per_page” count, so it had the wrong number of pages. I’ve corrected it.
Rich Pedley 3:26 pm on December 23, 2010 Permalink
nope still broke. search for eshop:
Showing 1-6 of 10 plugins with 6 displayed
page shows 9-9 of 10 plugins , and only 1 displayed
and why 6 per page, seems a weird amount.
Rich Pedley 3:28 pm on December 23, 2010 Permalink
sorry that should have been page 2 shows 9-9etc
Otto 3:30 pm on December 23, 2010 Permalink
I don’t know what you’re seeing, but for “eshop” it shows me 1-8 out of 10, with 2 pages. Works properly.
Otto 3:33 pm on December 23, 2010 Permalink
Ahh, okay, actually you’re seeing different results because you are not an admin who can see the hidden plugins. Basically those are plugins that have been entered and approved but which no code has been uploaded for them yet. So they are getting removed from your display, but not from the total count.
I’ll look closer at fixing that. But the normal display is 8 per page. If you see less, then the search is finding “unfinished” plugins then hiding them.
Rich Pedley 10:59 am on December 24, 2010 Permalink
Yeah that would explain it – but won’t help others
Majority of people searching are not admins.
and even so, 8 per page seems weird, can’t you bump it to 10? or is there some reason that octal rather than decimal is being used? Or even better a drop down selection to show 10/25/50 per page.
Rich Pedley 12:59 pm on January 3, 2011 Permalink
a search for directorypress is returning this in the results:
http://wordpress.org/extend/plugins/directorypress/
but that plugin no longer exists, so I’m guessing it got removed. But it’s still being found when searching.
Otto 7:51 pm on January 3, 2011 Permalink
Fixed. Numbers for searches are still going to be a bit off though. Working on it.