Looking For Published Google Docs? Bing It!

bing google docs

bing google docs

In September 2009, a Google employee, Marie, announced via the Google Docs forum that published Google Docs (which are made public by default) would be crawled, indexed, and used to serve SERPs:

This is a very exciting change as your published docs linked to from public websites will reach a much wider audience of people.

In other words, Google Docs wanted to flip its users’ data public to make more money out of it Facebook-style. In July 2009, two months before the announcement that linked-to published docs would be indexed, the Google Docs team removed a feature that automatically tagged published documents as “published”, making it really hard for users to keep track of which documents were potentially indexable. Technically, only docs linked to from a Website were crawlable and indexable, but as one commenter on the forum put it:

If someone who knows the URL for my published document, they can link to it from their own publicly crawled webpage, thwarting my attempts to avoid having it crawled. That means that I’m at the mercy of anyone who gets their hands on the URL for my published Doc.

Today, Google did not apply what Marie said it would. If you search for “pdf site:docs.google.com” on Google, no documents are found. If you do the same query on Bing, some 1.500.000 documents are found. How come?

Even after you publish your documents, spreadsheets or presentations, they won’t appear in the Google search index; however, other search engines may potentially index published docs.(from Google Docs Forum)

Google probably did that to avoid a PR mess around a central product of its recently-launched Pro Suite. Even if indexing was a good idea, strategically, it was too easy for the competition to hurt the image of Google Docs by arguing that Google could leak out to the world your sensitive business information. Nevertheless, instead of giving up the fight, Google decided to shut the competition down by allowing them to crawl and index Google Docs’ library of published docs. In the case of MSFT/Bing for example, Google is giving away a priceless source of information to Bing users, making Bing a more attractive search engine. On the other end, Bing users get to discover and get used to the universe of Google Docs. Google traded a bit of its search pie for a bit of Microsoft’s Office pie. Interesting!

Google Ajax Search Could Shut Us Down

Has anyone saw this recently in their traffic analytics?

Did Google finally post a link from their homepage to your site? Dream on! Google is working on the integration of a new search experience high on Ajax. The consequence is that analytics tools will be referrer-blind. Not quite, but almost. Here is how Sean from Clicky summarizes it:

Here’s what the new search result URLs look like with the new “Ajax” feature:


See how there’s a hash mark # in there now, and the “q=test” is after it? The problem is that web browsers don’t send anything after the # in the referrer string. This means organic searches from Google will now show up as just “http://www.google.com/”, with no search parameters. In other words, no analytics app can track these searches anymore.

The reaction in the community is divided: some think it will get fixed; some think that Google is doomed if they try to blind us all; others believe that as a business, Google is in its right to do whatever they feel like. Personally, I am torn between two things:

1. Google’s philosophy is geared towards openness. I trust this brand.

2. There are just too many changes going on right now: The economic climate hastens Web companies to make cash. Google is no exception. It already shut down a few services. Companies like Twitter opt for VIP access to make money. Facebook has always been a walled-garden, and it is growing at a scary pace.

And of course, Google Blog Search ‘broke’, and it doesn’t seem like they are about to fix it.

So which road will Google take here? If they make referring keywords private, of course it will generate a shameless amount of money for the company. It would be so ‘evil’ though, it would seriously hurt the brand. I don’t think that Google would cut us off this way, but by switching technologies, they have leeway to negotiate a new relationship with publishers based on new rules.

A Time/Space Search Engine That Doesn’t Exist Yet

I just finished writing a post on the Click2Map’s blog , Is Pointless Geotagging Disturbing?, about geotagging and how its integration into our existing social networks feels awkward. To explain how I feel about geotagging, I make a comparison with using our real names online, and how it went from being a big no-no ten years ago, and then turned into the best way to connect with the people we already know.

I want to expand on the thought that geotagging is used in an awkward manner today, but how it could become a resourceful content for a search engine that doesn’t exist yet.

Journalism is mutating these days; its traditional top-to-bottom information flow is being reversed. Citizen journalism is breaking through. Twitter has proven to be a unique news breaker, because each word submitted to the database gets instant visibility. Therefore, the news breakers have become the people who witness events on the spot. They spread the word way faster than a traditional newspaper would.

Citizen journalists are not professionals. They are individuals who happen to be at a specific place, at a specific time, during a specific event. What they express during this specific place/time/event has value because it provides info on an occurrence that questions us. It is a basic socio-psychological mechanism in action: If an event generates anguish, humans need info to appease this anguish, to make sense of what goes on. We’ve created God to make sense of the world. Rumors spread because they provide answers to questions that do not have answers.

Citizen journalism makes geotagged data useful by tracking individuals that meet the time/space criteria associated to an event. I don’t see any other ways geotagged data is used today. However, here are a few ideas of how it could be used:

Sometimes, people will place an ad in the newspapers’ classified section to find a person with whom they exchanged eye contact one morning in the subway on the way to work. Let’s say I am publishing this ad: If there was a database that was withholding any GPS trail generated by every individuals out there, I could do a time/space search, and send an alert to all the owners of the devices that coincide to my search query. The alert could be anonymous, and would simply invite the sought-after person to reply if they want to get in touch.

Presented as is, the idea might sound a little flacky, but I think it should be divided into 2 main components:

1. The Geotag search engine, kind of like the Google’s database of geotag data, which would gather all that info and make it available to trusted third-party Websites.

2. The already existing social networks that have become trusted carriers of our online identities.

The geotag search engine would just provide the time/space intelligence that creates knots between GPS trails. Once an individual expresses the desire to connect with other GPS trails, the search engine could ping the existing social networks, sending the info that someone is trying to get in touch with one member of the network. The individual is then free to accept the connection request, or refuse it, and everything remains anonymous.

So to get back to my initial idea, I think there is a wide-open opportunity in the search space for geotag search engine. It is not really clear today why this information would be useful for the mainstream user, but I have no doubt that the more we connect through GPS-enabled devices, the more it will start to make sense. And once people will understand the value of such a tool, then the search engine that is technology-ready to provide answers to time/space queries will rise very high.

I know that most of you think that this is a scary idea to be directly contacted by just anyone. It’s not. I interact on a daily basis with other peeps on Twitter, even though I have never met most of them. By following links, they can get my real name, probably my address, the restaurants I go to, the place where I work, the things I like, and so on. That never led to any undesirable event, and I don’t think it ever will.

What do you think?

Tracking Keyword Subscribers On Twitter Search

twitter rss search

Monitoring keywords on Twitter has become more fun than following users, and I think that everybody is starting to feel those invisible eyes scrutinizing their updates:

[blackbirdpie url=”http://twitter.com/#!/steveasher/status/1033241731″]

I started following him after seeing this message… The increasing use of Twitter Search’ feeds has created the need for a new Twitter tool I can’t seem to find anywhere on the Web: A feed analyzer that can tell me how many people have subscribed to a given keyword. For example, if I knew that 5,000 people had subscribed to the keywords “Twitter sucks”, then the title of my next blog post would contain those keywords.

[blackbirdpie id=”1033241731″]

Unfortunately, I wouldn’t be surprised if that were the black gold on which Twitter is intending to monetize its service. I am sure that most people pay much more attention to the keywords they track, and much less to the updates of the people they follow. As a business, I would be more interested in reaching those keyword subscribers. I would optimize my tweets so that it reaches those subscribers’ RSS reader or email inbox. I could even optimize an affiliate business where specific keywords lead the way from tweet to Amazon recommended products.

What do you think? First, have you heard of such a service for Twitter? If not, do you think that is how Twitter intends to monetize their platform, or do you think that this tool shall see the day in a near future?

Powerset’s Launch and Sell Strategy


Powerset today announced on their blog that the transaction with Microsoft has been finalized. The cool natural language search engine isn’t a wriggling fresh startup anymore, but has made the Darth Vader’s move towards the dark side of Microsoft’s heavy search infrastructures.

So Powerset isn’t one of the hype search startups of the Valley anymore. There has always been much ado about Barney Pell’s ability to buzz and sell startups, but I think that Powerset’s story is a case study that should be taught in tech economy classes.

First, consider the unusual amount of press they got in Techcrunch since August of last year:

This is more than 15 articles featuring Powerset in a year. Web startups usually get an average of 0.4 reviews a year on Techcrunch. In the case of Powerset, their news got covered when they were looking for a new CEO, when they made a case study with Miss South Carolina, and 4 articles were about their acquisition. Rather unusual… Undoubtedly, the PR firm contracted by Powerset did a good job creating compelling news about the company. Nonetheless, not all their clients get such good coverage. Powerset’s strength lied in their intrinsic buzz strategy:

  1. The company is a search engine – it positions Powerset as a virtual threat for Google (i.e. as an opportunity for Microsoft).
  2. Their first product is a Wikipedia search tool – it positions Powerset as a portal for one of the most popular Websites of all times.
  3. The search engine focuses on natural language search queries, a fantasy world – As Lorenzo Thione states in the interview above: “If you can crack that nut of understanding human language, with algorithms, with computers, then you open up the door to something that has been part of the collective imaginary for a long time”.
  4. The Founder’s got a great deal of startup experiences – Barney Pell has an impressive resume in the fields of search and VC activities.
  5. Powerset is part of the Silicon Valley’s cool kids gang – they co-organize exciting events for the local tech community.
  6. They get social media coverage – I don’t know what is their deal with top bloggers, but it works.
  7. All the factors above creates a bubble around the company’s name that consequently over-values the initial search engine project.