Twitter's billion $ search opportunity - Architect it right
When people say "Google Beater" these days, there's a reasonable chance that they're referring to "real time search", or to put it more simply, Twitter search. Google's Eric Schmidt engaged in a war of words with Twitter recently, and Techcrunch has declared that it's time to start thinking of Twitter as a search engine.
Twitter has distinguished itself from other communication channels in a few ways which have led to its importance as a source of search-able data:
- Messages are broadcast, and not shared in a close group: Twitter updates are public by default and appear in a "public timeline". Twitter's community has evolved in a way that most users want their updates to be public.
- One-way following is possible: Unlike other social networks that only allow a two-way "friendship" mechanism, Twitter allows any user to "follow" any other user's public timeline. This has helped make twitter a broadcast mechanism - many journalists and other trusted figures have tens of thousands of "followers" on Twitter.
- Anyone can "reply" to any user's post: Twitter doesn't have any restrictions on who you can reply to (unless, of course, you indulge in spamming, which's detected by Twitter and you get blacklisted).
- Messages are broadcast in real time: Twitter updates show up in your followers' feed instantly
- Useful messages get relayed many times over: Twitter's community has created a mechanism of re-tweeting a message, which helps relay important messages to large groups of people.
- Twitter users tag their posts for searchability: The use of "hashtags" on Twitter is a means by which the community can help twitter searchers find specific information around one theme - e.g., tweets about the Mumbai Attacks were tagged #mumbai
A few searches on Twitter help illustrate this:
- Want to know all the news and opinions about Obama: http://search.twitter.com/search?q=obama
- What are people saying about AIG's bailout: http://search.twitter.com/search?q=aig+bailout
- What does the world say about Tropicana's new packaging:http://search.twitter.com/search?q=Tropicana+new+packaging
However, since Twitter wasn't built for search (the search app was built outside Twitter by Summize, which Twitter acquired), searching on Twitter still has some holes. Doing a text match on all tweets and sorting them by time is useful, but users are surely looking for some notion of having "better" tweets bubble to the top of search results, rather than just looking at the most recent tweets. Search, in general, is about relevance and importance, and Twitter isn't architect-ed right for either:
- Importance of the tweet: The ranking of search results should be influenced by some measure of "importance" of the post (in some sense, this is the page rank equivalent of Google). In addition to the recency of a status update, a few measures of importance on twitter are
- the number of retweets,
- the number of replies received,
- the number of "favorites" received by the message
- and the follower count of the person posting.
- Twitter's retweet mechanism doesn't reference the status id of the message you're re-tweeting. For example, when you look at my retweet - http://twitter.com/vijaycs42/status/1275845600 - you won't be able to find the status-id of the message I am retweeting. This means that Twitter can't easily identify the messages that are getting re-tweeted the most.
- @replies on Twitter are not associated with the tweet being replied to: which means, Twitter doesn't have a simple way of figuring out the posts (not the people) that received the most replies.
- Relevance to the query: The fundamental problem is that each tweet has only 140 characters, and that's not much data to work with. And these 140 characters have no structure - unlike web pages that have structured fields like "title" and "anchor text". Hence, searching on Twitter is sort of like the early days of image search and video search, when there wasn't much text associated with the content (today, of course, the abundance of tags and user comments helps find the best images and videos easily). Try searching for the keyword "twitter" on twitter search- while all the posts mention the word, how many of those posts are really about Twitter? (e.g. I saw a post that said "I swear if anyone spoils Watchmen for me on Twitter, I am gonna go postal"). Luckily for Twitter, they do have a lot of additional text and some structure
- The extra text comes from replies and retweets,
- and the little bit of structure comes from the #tags
Another thing that bothers me about Twitter search is that it relies heavily on hash tags. For example, try and search for the TV series Lost. If you didn't know that lost has a hashtag of #lost, you're very likely to just get lost in your search. And today hash-tags are very arbitrary. For example, the hash-tag for the Mumbai attacks was #mumbai. Now there's no way to make an association between the words #mumbai and "mumbai attacks" - so you have to know to search for both if you want to retrieve all the results about the incident. Over time there will be situations when different people are using different hash tags to refer to the same incident. Twitter should think about a better tagging mechanism that scales nicely . Perhaps use the Flickr or Del.icio.us solution of allowing users to add arbitrary tags to posts; of course, that solution also has a downside - it distracts from the main functionality of Twitter, which's to allow users to tweet with minimal friction.
Twitter is already one of the most useful services out there in terms of the social function it serves. Now, it also has the potential to become one of the most useful search services. If it manages to fulfill that potential, it will certainly be worth a billion dollars, or maybe several. But getting there would need them to re-architect Twitter to make the search functionality more powerful.
Great post Vijay. Here are some comments regarding scaling Twitter usefulness as search.
For most searches there might not be any "tweeters" with idle time to respond to them. For popular events there might be enough people willing to engage in that conversation. For most long tail queries and the volume that goes with that this is not scalable as it requires real humans to answer on the other end than machine and algorithms.
Tags don't scale with scale. Flickr tags have not kept up with the sheer number of images uploaded. The ambiguity problem is even more severe with most tags irrelevant or weakly related. It's not impossible to solve this but just more harder.
minor nit. Your refer to Citibank but point to AIG search.
Posted by: mailman | March 08, 2009 at 07:24 PM
Good catch about the Citigroup/AIG mixup, fixed it now.
Agree about your point on tagging, as I'd indicated in the post as well. I admit I don't know the right answer here, but hashtags are certainly not the best solution.
Posted by: Vijay Chittoor | March 08, 2009 at 08:05 PM
The friction of setting up an account and tweeting to look for an answer would shut out a whole generation of users. To get a foothold in search Twitter has to wait for a whole new generation of hyper-internet friendly facebookers to become the norm.
Wouldn't it me better served as mouth piece platform for bloggers/writers/shouters. It's similar to what YouTube became. Broadcasting for the rest of us. And Twitter could be adding a layer to that.
Posted by: mailman | March 09, 2009 at 07:19 PM