October 08, 2009

It's Y!ou. Or is it?

Timesindiayahoo   F


What do you feel about Yahoo!'s new ad push? The ads made a big splash in India a few days ago, and now it turns out that New York is also plastered with them. 

I feel that the branding of internet products is somewhat like branding in the service industry: the brand is all about the customer experience and not about the tagline (here's a good article about how the brand is not about the slogan). For instance, it doesn't matter to me if a bank comes up with the coolest slogan, if that slogan has nothing to do with my experience banking there, or if I have a poor experience. In some sense, web companies are also service providers, and if they fulfill a great service promise, they win with users, and they lose if they don't.

In that light, I don't understand how Yahoo! is about me. It seems that they want to latch on to a hot new theme, without doing anything to fulfill the promise of that experience. What do you feel?

June 22, 2009

Cross posts from Inc

I've started writing a Byline with inc.com, around the theme of analyzing technology trends for small businesses & entrepreneurs. Here are the first two articles that have been published:

  1. Lessons from Web 2.0, Fast Track Innovation Process:  This article starts with premise that Web 2.0 companies have some natural advantages that help them innovate fast, and then examines how other businesses can leverage some of these ideas.
  2. The Long Tail & The Black Swans: I examine how the Black Swan, or the unexpected hit, can influence the analysis of whether & how you should use a long-tail business model.

Would love to get your feedback.

March 08, 2009

Twitter's billion $ search opportunity - Architect it right

When people say "Google Beater" these days, there's a reasonable chance that they're referring to "real time search", or to put it more simply, Twitter search. Google's Eric Schmidt engaged in a war of words with Twitter recently, and Techcrunch has declared that it's time to start thinking of Twitter as a search engine.

Twitter has distinguished itself from other communication channels in a few ways which have led to its importance as a source of search-able data: 

  1. Messages are broadcast, and not shared in a close group: Twitter updates are public by default and appear in a "public timeline". Twitter's community has evolved in a way that most users want their updates to be public.
  2. One-way following is possible: Unlike other social networks that only allow a two-way "friendship" mechanism,  Twitter allows any user to "follow" any other user's public timeline. This has helped make twitter a broadcast mechanism - many journalists and other trusted figures have tens of thousands of "followers" on Twitter. 
  3. Anyone can "reply" to any user's post: Twitter doesn't have any restrictions on who you can reply to (unless, of course, you indulge in spamming, which's detected by Twitter and you get blacklisted).
  4. Messages are broadcast in real time: Twitter updates show up in your followers' feed instantly
  5. Useful messages get relayed many times over: Twitter's community has created a mechanism of re-tweeting a message, which helps relay important messages to large groups of people.
  6. Twitter users tag their posts for searchability: The use of "hashtags" on Twitter is a means by which the community can help twitter searchers find specific information around one theme - e.g., tweets about the Mumbai Attacks were tagged #mumbai
Due to these features Twitter has become the primary destination to broadcast real time information, be it about a big global event like the Mumbai blasts with an audience of millions, or a talk at a conference which's being discussed over tweets by a few tens or hundreds of people. Journalists, bloggers & marketing professionals are falling over each other to engage in conversations and help shape opinion over Twitter.

A few searches on Twitter help illustrate this:
On some of the most news-y and hotly discussed topics, Twitter search results update faster than you can keep pace with. A Twitter search is an invaluable resource for anyone engaged in understanding news and opinions.

However, since Twitter wasn't built for search (the search app was built outside Twitter by Summize, which Twitter acquired), searching on Twitter still has some holes. Doing a text match on all tweets and sorting them by time is useful, but users are surely looking for some notion of having "better" tweets bubble to the top of search results, rather than just looking at the most recent tweets. Search, in general, is about relevance and importance, and Twitter isn't architect-ed right for either:
  • Importance of the tweet: The ranking of search results should be influenced by some measure of "importance" of the post (in some sense, this is the page rank equivalent of Google). In addition to the recency of a status update, a few measures of importance on twitter are
    • the number of retweets,
    • the number of replies received,
    • the number of "favorites" received by the message
    • and the follower count of the person posting.
          Twitter's architecture today doesn't allow its search to leverage some of these measures of importance other than the date/time, number of favorites and the follower count:
    1. Twitter's retweet mechanism doesn't reference the status id of the message you're re-tweeting. For example, when you look at my retweet - http://twitter.com/vijaycs42/status/1275845600 - you won't be able to find the status-id of the message I am retweeting. This means that Twitter can't easily  identify the messages that are getting re-tweeted the most.
    2. @replies on Twitter are not associated with the tweet being replied to: which means, Twitter doesn't have a simple way of figuring out the posts (not the people) that received the most replies.
  • Relevance to the query: The fundamental problem is that each tweet has only 140 characters, and that's not much data to work with. And these 140 characters have no structure - unlike web pages that have structured fields like "title" and "anchor text". Hence, searching on Twitter is sort of like the early days of image search and video search, when there wasn't much text associated with the content (today, of course, the abundance of tags and user comments helps find the best images and videos easily). Try searching for the keyword "twitter" on twitter search- while all the posts mention the word, how many of those posts are really about Twitter? (e.g. I saw a post that said "I swear if anyone spoils Watchmen for me on Twitter, I am gonna go postal"). Luckily for Twitter, they do have a lot of additional text and some structure
    1. The extra text comes from replies and retweets,
    2. and the little bit of structure comes from the #tags
But, as I mentioned above, @replies and retweets are not associated well with tweets, in Twitter's current architecture. Fixing that will not only give Twitter a better importance signal, but also help improve text relevance by making a lot more text available for search. Twitter can gather even more data around tweets that contain a URL by indexing all or some of the contents of the url.

In some sense, both the above themes are about how it's much easier (and much more valuable) to accurately search conversations (a bunch of related tweets) than to search individual tweets. Facebook has probably got a lead over Twitter in this aspect, because they naturall group conversations together.

Another thing that bothers me about Twitter search is that it relies heavily on hash tags.  For example, try and search for the TV series Lost. If you didn't know that lost has a hashtag of #lost, you're very likely to just get lost in your search. And today hash-tags are very arbitrary. For example, the hash-tag for the Mumbai attacks was #mumbai. Now there's no way to make an association between the words #mumbai and "mumbai attacks" - so you have to know to search for both if you want to retrieve all the results about the incident. Over time there will be situations when different people are using different hash tags to refer to the same incident. Twitter should think about a better tagging mechanism that scales nicely . Perhaps use the Flickr or Del.icio.us solution of allowing users to add arbitrary tags to posts; of course, that solution also has a downside - it distracts from the main functionality of Twitter, which's to allow users to tweet with minimal friction.

Twitter is already one of the most useful services out there in terms of the social function it serves. Now, it also has the potential to become one of the most useful search services. If it manages to fulfill that potential, it will certainly be worth a billion dollars, or maybe several. But getting there would need them to re-architect Twitter to make the search functionality more powerful.

December 08, 2008

Kosmix Beta is live

I haven't posted anything on this blog for a bit, and for a good reason: along with a lot of folks from Kosmix, I've been busy building out our universal search product. It just went live today, along with an announcement of a new round of funding for kosmix. You can read more about it on the Kosmix blog

Do give it a spin. And I'd love to hear your feedback.

October 18, 2008

Measuring the impact of PR for Consumer Web companies

Despite the amount of effort and time that internet startup execs and CEOs spend on PR and media outreach, very few have a good understanding of the impact of their efforts, at either the macro level (i.e., the entire PR strategy) or the micro level (specific press mentions). You don't often hear the word "metrics" associated with the world of PR and marketing, but there's no denying that having a deeper understanding helps in multiple ways - figuring out what media outlets drive more traffic, which ones produce more engaged visitors, what stories get picked up virally, and what message works best with consumers.

A recent article about Kosmix on Lifehacker clearly showed me how little we understand about our own PR efforts. Till we accidentally stumbled upon some metrics around this article, which was featured on the Lifehacker homepage on Oct 13, none of us even realized that this was turning out to be one of our most impactful mentions in the media, which's not surprising, because in the Valley we're conditioned to ingore blogs or sites that don't include "tech" or "venture" in their name. 

So what do I mean when I say that this article was one of our more successful mentions? Here are some metrics that I looked at:

  • The number of visits from this source made it rank amog our top referrers on the day this article was carried and the next couple of days
  • The visitors from this channel were among the most engaged visitors we've ever seen: the amount of time they spent on our site was 3 times the site's average, the page views per visit were 2 times the site's average, their bounce rate was half that of the site's average, and these users left a ton of feedback for us using the feedback form on our topic pages.
  • Doing a search on Friendfeed or del.icio.us for Kosmix.com showed us that hundreds of visitors had bookmarked our site or spread the word using various social bookmarking and sharing tools. Bookmarks of course help drive adoption, but they additionally help in the likelihood of your pages being found through search engines (check out my previous post).

And to think except for an accident we'd never even have figured this out!! Of course, knowing this helps us immensely as we get ready for our next wave of media outreach.

Here are some ways you can deepen the understanding of your media efforts:

  • For each press mention, track the # of visitors as well as all the engagement and conversion metrics. Google Analytics can be a great tool for startups that don't have an in house metrics system
  • Track the number of bookmarks each day and correlate that with media mentions. To find out the number of del.icio.us bookmarks by date, you can go to  http://delicious.com/url, and type in the URL, or use their API. Likewise, Friendfeed, Stumbleupon, and Digg might be other places to look.
  • Go through the notes written by users on del.icio.us other sites for bookmarking and sharing, and use the comments to fine tune your positioning and your communications. After all, positioning is what users think about you, not what you think about yourself - you can only reinforce users' mental picture through effective communication.
  • Track the number of repeat visitors by source using cookies. Use this to understand the kind of blogs or sites, or specific press mentions, that drive the most repeat users.
  • Make it easy for users to leave feedback for you, and track this feedback by source. Act on the feedback before your next PR effort.
  • Test a few variations of your core message and check if any of them leads to more engaged users; or if any one of them tends to drive more traffic to your site than others.

It's also worthwhile doing this exercise on competitors and seeing if they're having more success than you:)

Would love to hear from you if you have other ideas on tracking the effectiveness of PR. I know that most readers of this blog will likely have more experience with marketing communications than I do, and it'd be good to understand if any of this resonates with you.

September 20, 2008

Understanding Mahalo's SEO machine

One of the biggest challenges that Web startups face is around driving traffic to their sites. Mahalo, launched in 2007 as the human-powered search engine, has successfully crossed the first set of hurdles, quickly growing up to 2.2 million unique visitors a month and growing.

How does Mahalo do this? The quick answer is Search Engine Optimization (SEO), i.e., optimizing their pages to make them appear towards the top in search results. In the Web 1.0 world, About.com successfully executed this strategy, and became one of the most visited sites on the internet, before getting bought out by New York Times for $400 million. While Mahalo's numbers are nowhere close to About.com, their growth trajectory has been pretty good.

It turns out that a bit of quick sleuthing using publicly available tools can reveal a lot about Mahalo's SEO tactics. Let me share a few quick insights here, that may help other sites trying to emulate Mahalo's success. 

First up, check out Quantcast to find some of the keywords which drove traffic to Mahalo last month. If you're serious about taking this analysis to the next level, you'd subscribe to paid services like Hitwise or Comscore which would perhaps give you access to more keywords, but for now, let's limit ourselves to the few keywords we find on Quantcast.

Mahalo_keywords

One of these keywords is "Bernie Mac Dead". As of this writing, a search on this keyword showed Mahalo to be the 3rd result on Google with the URL www.mahalo.com/Bernie_Mac_Dead

The next step is to analyze the links pointing to this URL. To do this, go to Yahoo, and search for "link:www.mahalo.com/Bernie_Mac_Dead"

Of course, Yahoo's index is different from Google's and some of the links might be different, but you still get a pretty good idea. You'll notice that many of these links are from Mahalo's own pages, belonging to the directory mahalo.com/member/.. A quick look at all these members will tell you that they're all Mahalo guides - e.g., Julia, Bernices and Evan D - check out the "About" tab in each case.

Some more links come friendfeed: from the accounts of Jason Calacanis and Andrew Dobrow. Jason is a founder of Mahalo, and Andrew is a guide at Mahalo, as you can quickly figure out by googling them up. You'll further notice that Friendfeed picked up these items from Del.icio.us and Twitter. You'll also notice links from Jason's and others' accounts on Tumblr, Magnolia, Swurl and more.

Some of the links come from youtube sites of various countries. To find these links, go the section "Statistics and Data". Here I confess that I am very puzzled: while the Mahalo page on Bernie Mac Dead does link to 1 (and only 1) Youtube video, the link to this page appears in several other Youtube videos which are not linked off from Mahalo's Berni Mac Dead page; e.g. this video on Olympics thinks that it's being referenced by the Bernie Mac Dead page on Mahalo. The only way I can think of doing this is to create dummy pages which link to this video and then have those pages redirect to the Bernie Mac Dead page (???) - ok, I might as well confess I have no clue how this is done. Experts please chime in.

Mahalo also cross-links heavily to the Bernie Mac Dead page, with links from the Mahalo pages for Isaac Hayes Stroke and many others.

Mahalo also seems to have an arrangement with the Rochester Public Library, so that the link 

http://www.rochesterpubliclibrary.org/apps/Reference/reflinks/redirect.cfm?2079

redirects to Mahalo - pretty sweet, huh?

Ok, there's a lot more but let me leave that as an exercise to the reader.

The analysis so far was at the level of how Mahalo optimizes the links to each page. At the page level, one could also analyze the contents of the page. The title and the URL both mention the target keyword, and also there's a big H1 with "Bernie Mac Dead" right at the top. The other notable aspect is the use of lack of use of nofollow tags on the links from the page, which you can view by doing a view source on the page; it's interesting that Mahalo defies conventional wisdom by not attaching nofollow tags to external links. Perhaps, that makes the linking pattern appear more natural to Google, and favors Mahalo? Again, would love to get tips from SEO experts on this.

Finally, we should analyze the SEO strategy at a more macro level instead of at the level of individual pages.Mahalo chooises keywords and pages that are very topical, and constantly refreshes the keywords that appear on its home page. Sometime I'd seen some discussion around Google's "news worthiness" score that ranks very topical, fresh and constantly updated pages very high for topics that are news worthy. Mahalo seems to be playing well to that tune. If you google any of the topics featured on Mahalo's homepage, chances are that you'll find Mahalo in the top 10 results.

Mahalo also encourages deep links from others by distributing its content under creative commons

Not being an SEO expert, I'm sure I've overlooked some aspects of Mahalo's strategy. But overall, whatever I've seen convinces me that SEO must be a religion at Mahalo, and that everyone at the company participates by using links from their social bookmarking/networking accounts, as well as by gathering external links using a variety of techniques.

While some of these techniques might be suspect, I guess many of them are legitimate, and it might help many other startups to learn some of them

September 19, 2008

Ebay selling Stumbleupon? Social bookmarking and search

I have been following the story around Ebay selling Stumbleupon, just a year and a half after it acquired it for $75 million. Around the time of the acquisition, many people speculated that ebay would like to get a foothold in the search market, otherwise largely dominated by Google; this GigaOm article was one of those mentioning "The toolbar, if you ask StumbleUpon users provides more useful and productive results, than say Google. By marrying the toolbar to Skype client, eBay can do an end run around Google’s dominance of the search business. A simple search box inside Skype client is all it would take."


Interestingly, many people believe that social bookmarking sites were the first form of search to use "tagging", and are therefore a revolutionary new technique in search. After all, what could be better than letting humans figure out which pages to tag with keywords like "barack obama", instead of letting algorithms do this job. The inherent assumption is that conventional search engines don't use any form of tagging.
 

However, this view of the world is flawed. I'd like to argue that search engines like Google use human tagging in a big way in their search rankings. Except that it's a different kind of tagging - the tags used by publishers on the web to refer to the documents they link to. Conventionally, this is called "anchor text" in search engine lingo, but if you look at it, it serves the same purpose as tagging on bookmarking sites like stumbleupon. The only difference is that Stumbleupon and Del.icio.us expand the scope of the tagging to users who might not be writing or blogging on the web. 

Let me take a couple of examples to make this more concrete. Search for "Sarah Palin" on Yahoo, and take a look at any of the links. Then try the search "link: <url>" on Yahoo: like this. This will show you all the webpages with links pointing to the URL you're analyzing. If you look at the text of these links, you'll see that many of them refer to the URL we're analyzing with the words "Sarah Palin" or variants thereof. For example, doing a "view source" on http://abfreedom.blogspot.com reveals this link: <a href="http://www.johnmccain.com/about/governorpalin.htm">Sarah Palin</a>


If you compare the search results on a popular keyword like "Sarah Palin", you'll find that search engines like Google and Yahoo have many more results, as well as more relevant ones, than social bookmarking sites like Stumbleupon or Del.icio.us. This difference is even more stark when you move to more tail-ish keywords: e.g., try "Miruts Yifter", which produces just 1 result on Stumbleupon but more than 15,000 relevant ones on Google. And most of those 15,000 results are linked off from other pages on the wb with the anchor text (or tag) "Miruts Yifter".

Essentially, Google gets to use tags much more than Stumbleupon. And to top it all, they could also crawl Stumbleupon pages and use the tags available on those.

But the key thing to note is that anchor text/tags are just 1 component of the ranking mechanism on classic search engines. Link popularity, text on the page/title/url and many other components go into thr final ranking.

It's just naive to assume that social bookmarking sites will be able to build better search just based on "tagging".

However, these sites do provide value to users in other ways: 1) a permanent store of users' bookmarks 2) ability to discover people interested in a similar topic 3) a fun browsing experience. For all these reasons, social bookmarking sites aren't going to go away, but don't expect them to develop into full fledged search engines. Ebay's rumored decision to sell Stumbleupon might be a reflection of this realization.

August 28, 2008

Dynamics of Online Web Economics: Dropping CPMs and implications

The average CPM of all internet page views can be calculated as the total $ amount spent in online advertising divided by the total number of page views.

Please note that CPM in this case also refers to the effective CPM of CPC and CPA ads, and doesn't refer only to ads sold on CPM pricing.

If it can be shown that the denominator (# page views) will grow faster than  the numerator ($s spent online), then it must follow that the average CPMs on the internet will come down.

Many sources track the numerator: in the United States,  online ad sped is growing at about 18% according to this IAB release.

But calculating the growth in page views proved trickier. Instead let me calculate a proxy which's aggregate time spent online, i.e., time spent online per user times the number of online users.  Time spent online per user is growing about 25% YOY according to this ITFACTS.biz article quoting Compete; in addition to this, the number of users has also grown - while I couldn't find latest data, a Pew report showed that the % Americans online had grown by 14 million people between 2005 to 2006, which's approximately 7% of the poulation. Putting the 2 together roughly adds up to about 32%.

Clearly, growth in time spent may not linearly translate into growth in page views, especially due to the growth of videos which can consume substantial time in 1 page view.  However, the effect of videos is easy to strip out by looking at this Nielsen report: if you do the math, it's clear that video views account for only 8.7% of all time spent online. Even if 100% of that has been added in the last couple of years, we have a net growth in  non-video time spent of about 24%, and this number is likely higher.

Putting all this together, we can conclude with reasonable certainty that PVs online are growing faster than online ad $s.

Which means that average CPMs are dropping. From there it would take a couple of small leaps of faith to conclude that 1) this is a long term trend (after all the sum total of ad dollar growth, online+offline, is finally indexed to GDP growth; whereas PV growth can be much more elastic) 2) this would effect not only the newcomers but also the incumbents.

How does one react to dropping CPMs? What can you do if you're an online service whose primary revenue stream is online ads.

Well, firstly, don't panic. The costs of running an online service are dropping dramatically too, and that helps all companies maintain their spreads. New companies always recognize these dynamics very well, but the incumbents in any industry have the tendency to be fat cats. The cost per page view served should become an important metric for all internet CEOs. Hardware cost is certainly one component of this cost and it tends to be easy to manage; what's harder usually are the people costs where companies must innovate to leverage small teams of employees who don't scale linearly in size with the growth in website traffic.

Secondly, all the dynamics I mentioned in terms of the increase in PVs on the internet can work to your benefit. If you manage to grow your page views to match or beat the rate of growth on the internet as a whole, you'll likely do well. Especially if you keep your cost of serving additional page views down and maintain your spreads. You must set very aggressive goals for page view growth.

Thirdly, this has implications for how you do detemine your marketing costs. Marketing costs must come down on a CPM basis, and companies must move to ever more efficient marketing channels, and perform ROI analyses of all their channels.

Finally, these trends also have implications on how you manage your ad sales force. Clearly, your sales team needs to understand these dynamics and accordingly seek to optimize the tradeoffs between sell throughs and CPMs. In addition to this, companies relying on online ads for revenues must also consider low cost self serve models.

In all of these, there are opportunities for intermediaries and aggregators (e.g., for consolidating self serve banner and widget ads across many publishers, or for providers of cloud computing who're helping drive hardware costs down).

I would love to hear from anyone who's seeing this trend play out as I mentioned, or any other way. I do recognize that I am forming conclusions based on data patched from multiple sources, but it seems right to me:)

August 20, 2008

New York Times' SEM strategy

I was searching for George Carlin, and saw this ad by New York Times:

Nyt_ad

















It led to this George Carlin obituary on the NYT website. Very valuable information, as a user.

But I am wondering what's NYT's business model for such kind of search engine marketing. Assuming they're amongst the lowest bidders on Google, they'd still end up paying almost 5 cents per click, i.e., they have to make $50 per 1000 visits.

If each visitor does only 1 page view per session, which seems likely given this landing page, NYT would effectively have to monetize at $50 CPM net. Which seems pretty high for the George Carlin obituary. Even at 2 page views per visit, which seems high looking at this landing page, NYT would have to make $25 net.

The only ads I saw on that page were a WaMU ad and a KBB ad, both of which were untargeted, as well as some text links from Google.  Oh, and also some house ads.

Are the folks at NYT assuming a lifetime value to a user "acquired" through SEM? Or are they trying to recover the money they spend on each visit?  Or is there no accounting at all, and is it clubbed under some fuzzy "marketing spend" which doesn't have to confirm to any profitability metric?

Whatever be the answer, it seems to me that outside of shopping and lead gen companies, a lot of companies advertising on Google still haven't figured out the economics of the Adwords marketplace. 

August 06, 2008

American Airlines & Kayak: Why no open API?

In a widely reported move, American Airlines recently severed its relationship with booking site Kayak.  Which leads me to wonder: why are other airlines partnering with sites like Kayak and Orbitz in deals that require them to pay a booking fee?

Don't get me wrong - I am not suggesting that these airlines should try to block comparison booking sites. In fact, entirely the opposite.

They should simply put out free APIs or searchable feeds for anyone to use - for free! This would encourage the creation of more booking comparison sites, and the ones with the best user experience would win that game. Kayak and Orbitz would be welcome to use these feeds as well, but they won't get paid any money for a referral. Instead, these sites would have to make money through ads or other means.

The only cost the airlines would have to bear would be the cost of supporting the search requests made on the APIs, which would be just single digit cents per 1000 API calls.  By limiting the number of calls per developer by default (say, a few 1000 per day default), and increasing it for trusted sites with proven ctrs and conversion, they can keep this cost very low, to a point where it's negligible.

This is much better solution to this problem than blocking Kayak or paying Orbitz, both of which seem like outdated ideas.

I don't understand why airlines aren't doing this already. Does anyone know a good reason?