When people say "Google Beater" these days, there's a reasonable chance that they're referring to "real time search", or to put it more simply, Twitter search. Google's Eric Schmidt engaged in a war of words with Twitter recently, and Techcrunch has declared that it's time to start thinking of Twitter as a search engine.
Twitter has distinguished itself from other communication channels in a few ways which have led to its importance as a source of search-able data:
- Messages are broadcast, and not shared in a close group: Twitter updates are public by default and appear in a "public timeline". Twitter's community has evolved in a way that most users want their updates to be public.
- One-way following is possible: Unlike other social networks that only allow a two-way "friendship" mechanism, Twitter allows any user to "follow" any other user's public timeline. This has helped make twitter a broadcast mechanism - many journalists and other trusted figures have tens of thousands of "followers" on Twitter.
- Anyone can "reply" to any user's post: Twitter doesn't have any restrictions on who you can reply to (unless, of course, you indulge in spamming, which's detected by Twitter and you get blacklisted).
- Messages are broadcast in real time: Twitter updates show up in your followers' feed instantly
- Useful messages get relayed many times over: Twitter's community has created a mechanism of re-tweeting a message, which helps relay important messages to large groups of people.
- Twitter users tag their posts for searchability: The use of "hashtags" on Twitter is a means by which the community can help twitter searchers find specific information around one theme - e.g., tweets about the Mumbai Attacks were tagged #mumbai
Due to these features Twitter has become the primary destination to broadcast real time information, be it about a big global event like the Mumbai blasts with an audience of millions, or a talk at a conference which's being discussed over tweets by a few tens or hundreds of people. Journalists, bloggers & marketing professionals are falling over each other to engage in conversations and help shape opinion over Twitter.
A few searches on Twitter help illustrate this:
On some of the most news-y and hotly discussed topics, Twitter search results update faster than you can keep pace with. A Twitter search is an invaluable resource for anyone engaged in understanding news and opinions.
However, since Twitter wasn't built for search (the search app was built outside Twitter by Summize, which Twitter acquired), searching on Twitter still has some holes. Doing a text match on all tweets and sorting them by time is useful, but users are surely looking for some notion of having "better" tweets bubble to the top of search results, rather than just looking at the most recent tweets. Search, in general, is about relevance and importance, and Twitter isn't architect-ed right for either:
- Importance of the tweet: The ranking of search results should be influenced by some measure of "importance" of the post (in some sense, this is the page rank equivalent of Google). In addition to the recency of a status update, a few measures of importance on twitter are
- the number of retweets,
- the number of replies received,
- the number of "favorites" received by the message
- and the follower count of the person posting.
Twitter's architecture today doesn't allow its search to leverage some of these measures of importance other than the date/time, number of favorites and the follower count:
- Twitter's retweet mechanism doesn't reference the status id of the message you're re-tweeting. For example, when you look at my retweet - http://twitter.com/vijaycs42/status/1275845600 - you won't be able to find the status-id of the message I am retweeting. This means that Twitter can't easily identify the messages that are getting re-tweeted the most.
- @replies on Twitter are not associated with the tweet being replied to: which means, Twitter doesn't have a simple way of figuring out the posts (not the people) that received the most replies.
- Relevance to the query: The fundamental problem is that each tweet has only 140 characters, and that's not much data to work with. And these 140 characters have no structure - unlike web pages that have structured fields like "title" and "anchor text". Hence, searching on Twitter is sort of like the early days of image search and video search, when there wasn't much text associated with the content (today, of course, the abundance of tags and user comments helps find the best images and videos easily). Try searching for the keyword "twitter" on twitter search- while all the posts
mention the word, how many of those posts are really about Twitter?
(e.g. I saw a post that said "I swear if anyone spoils Watchmen for me
on Twitter, I am gonna go postal"). Luckily for Twitter, they do have a lot of additional text and some structure
- The extra text comes from replies and retweets,
- and the little bit of structure comes from the #tags
But, as I mentioned above, @replies and retweets are not associated well with tweets, in Twitter's current architecture. Fixing that will not only give Twitter a better importance signal, but also help improve text relevance by making a lot more text available for search. Twitter can gather even more data around tweets that contain a URL by indexing all or some of the contents of the url.
In some sense, both the above themes are about how it's much easier (and much more valuable) to accurately search conversations (a bunch of related tweets) than to search individual tweets. Facebook has probably got a lead over Twitter in this aspect, because they naturall group conversations together.
Another thing that bothers me about Twitter search is that it relies heavily on hash tags. For example, try and search for the TV series Lost. If you didn't know that lost has a hashtag of #lost, you're very likely to just get lost in your search. And today hash-tags are very arbitrary. For example, the hash-tag for the Mumbai attacks was #mumbai. Now there's no way to make an association between the words #mumbai and "mumbai attacks" - so you have to know to search for both if you want to retrieve all the results about the incident. Over time there will be situations when different people are using different hash tags to refer to the same incident. Twitter should think about a better tagging mechanism that scales nicely . Perhaps use the Flickr or Del.icio.us solution of allowing users to add arbitrary tags to posts; of course, that solution also has a downside - it distracts from the main functionality of Twitter, which's to allow users to tweet with minimal friction.
Twitter is already one of the most useful services out there in terms of the social function it serves. Now, it also has the potential to become one of the most useful search services. If it manages to fulfill that potential, it will certainly be worth a billion dollars, or maybe several. But getting there would need them to re-architect Twitter to make the search functionality more powerful.