Pages

Friday 27 January 2012

Colorless Green Ideas Sleep Furiously


From Wikipedia:
"Colorless green ideas sleep furiously" is a sentence composed by Noam Chomsky in 1957 as an example of a sentence whose grammar is correct but whose meaning is nonsensical, however some might argue that Chomsky simply wasn't imaginative enough to put the sentence into a context which would give it meaning. It was used to show inadequacy of the then-popular probabilistic models of grammar, and the need for more structured models.

Chomsky wanted a model with rules and representations, a formal way to describe a language, and imposed his views. But it looks like green ideas do sleep furiously and when they wake up, grow furiously. Speech recognition system started to use probabilistic approaches to make the distinction between similar-sounding words or phrases. And Google uses this for its machine translation:

Most state-of-the-art commercial machine translation systems in use today have been developed using a rules-based approach and require a lot of work by linguists to define vocabularies and grammars.

Several research systems, including ours, take a different approach: we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model.

An easy-to-understand explanation of the system is given by David Yarowsky:
Say you want to teach a computer how to translate Chinese: You give the computer 100,000 sentences in English and the same 100,000 sentences in Chinese and run a program that can figure out which words go to which words. If in 2,000 sentences you have the word Washington, and in about the same number of sentences you have the word Huashengdun, and they occur in the same place in the sentence, these words are likely translations.

So far, Google has released statistical machine translation systems for English <-> Chinese and English <-> Arabic, but more languages should be available soon.

Hide a Site in Google Page Creator



If you want to create pages or edit them, without anyone even suspecting you have a site at Google Page Creator, you have to go to "Site settings" and check "Hide this site". This way, none of your pages will be visible until you uncheck that option.

It's also a good replacement for "Delete this site", because your uploaded files will be unavailable too.

You can create four more sites, using one account. This feature was initially experimental, then it wasn't available, and now it's back.

Secret Google JSON API


Google already offers feeds for Google News, Blog Search, Google Video, so you can use the search results in your applications or sites. There's also a Google API for web search that uses SOAP, but it's limited to 1000 queries per day.

For the first time, Google offers a new kind of API, unified for web search, image search, blog search and video search. The API uses JSON, so creating applications in JavaScript is easy. You must know that this API is unofficial, so the details can change.

Google JSON API is the foundation of SearchMash, an experimental site created by Google.

So how do you get the search results using this API? You just load this page:
http://www.searchmash.com/results/[query]. You just have to replace [query] with the actual query. If you use this format: http://www.searchmash.com/results/[query]?i=11&n=10, you request 10 search results, starting with the result number 11. The formats for image search, blog search and video search are:

http://www.searchmash.com/results/images:[query]
http://www.searchmash.com/results/blogs:[query]
http://www.searchmash.com/results/video:[query]

The JSON object you get from Google has a list of members that are very easy to understand, like: estimatedCount (the number of search results) or results, which is an array that describes the search results. To make cross-domain requests, you may need to create a web proxy, like shown here.

XSS Vulnerability in Google Search Appliance




Maluc found a cross-site scripting vulnerability in Google Search Appliance, a box that indexes documents from intranet and web sites. If you set the output encoding to UTF-7, the appliance doesn't validate the query and you can pass JavaScript.

Here's one example for Stanford's site that uses Google Search Appliance: stanford.edu.

Extra Storage for Picasa Web Albums


Chris L. found an option to get more storage in Picasa Web Albums.

* Up to 250GB of storage space in your Picasa Web Albums account.

* 12 months of hassle-free uploading and sharing
No complicated monthly bandwidth limits to keep track of.

Each year, we'll automatically renew your account. But don't worry, we'll always contact you with the option of cancelling before charging your credit card.

You can always use your free Picasa Web Albums account without upgrading.

Choose the amount of storage you want:

6.25GB ($25 per year)
25GB ($100 per year)
100GB ($300 per year)
250GB ($500 per year)

These options seem to be available only for the US users. On the other hand, Yahoo Photos, that has been recently updated, has free unlimited storage and more features than Picasa Web Albums (like tags, ratings, search, photo editing, private albums).

New in Page Creator: Photo Editing and Mobile Sites


When you add a photo to Google Page Creator, you'll see new options. You can now crop a photo, rotate it, change the brightness, mix it with another photo, change the contrast, reduce the colors or sharpen the photo. Basically you can apply simple effects from your browser. In the screenshot below, I've used the mash-up effect.



Now every page created in Google Page Creator can be easily accessed from a mobile phone, as Google redirects it to the transcoded version, the same way it does with the search results. Of course if you manage to enter the long URL correctly.

Also interesting:
Hide a site in Google Page Creator
Page Creator supports JavaScript

Faster, More Convenient Holidays With Google Checkout


I don't know why, but every news about Google Checkout has something ridiculous and earthly. But the latest news is just too much: "According to a new survey conducted by Harris Interactive and commissioned by Google Checkout, 40% of employed U.S. adults say they'll be doing at least some of their online holiday shopping from work this year, with 1 in 4 of those shoppers logging on to track down that perfect gift on Monday, November 27 (57% plan to shop during coffee and lunch breaks, while 34% will wait until the end of the workday)." So 10% of employed U.S. adults will try to find the perfect gift on Monday, November 27. And Google decided to launch a version of Checkout for holidays on Monday to capitalize on this. Buyers will get $10 off purchases of $30, or $20 off purchases of $50, while sellers get free processing. And everyone will be happy. Google executives thought this holidays are the last chance for Google Checkout and they'll do everything to make their product successful. "Trying to squeeze online holiday shopping into already busy schedules, shoppers will be looking for even more speed and convenience this year. And while there are many online shopping options to make finding the right gift relatively easy, online shoppers still have to deal with hassles, such as entering billing, shipping, and contact information multiple times as they move from site to site. Google Checkout eliminates an average of 15 steps from the online checkout process, in many cases making checking out as simple as entering a single login. This can save a lot of time for online shoppers, who will visit an average of 5.5 websites for holiday gifts this season, according to the survey." Squeeze, shoppers, hassle. More speed, convenience. Happy holidays!

Private Picasa Web Albums? Almost


I don't see why the concept of private album should be debatable. An album is private if it can be accessed by the author and a list of persons invited by the author.

Google decided to replace the concept of private album with unlisted album. Basically anyone can access that album if he knows its title and the Gmail address of the author or the URL of a public album. Google even suggested to choose strange names for the unlisted albums, so they're difficult to guess.

Now Google adds a parameter to the URL of an unlisted albums, like:
http://picasaweb.google.com/[gmail address]/AlbumName?authkey=blabla, and denies you access if you don't specify that authentication key. But there's still a problem: anyone who enters the complete address can see the album, the address can be indexed by search engines if someone links to it. So much for a private album.

More context:
Picasa Web Albums launch
No private albums
Authkey parameter makes its appearance

GOOG, More Than $500

GOOG, More Than $500


CNN Money reports that for the first time, Google stock jumped above $500. "Google is up more than 20 percent this year, far outperforming fellow Internet giants Yahoo!, eBay and Amazon.com, whose shares have all slumped in 2006. Google went public in one of the most widely awaited IPOs in recent memory in August 2004 at $85 a share."

Google shares closed at $509.65.

The New Google Book Search

The New Google Book Search


Google Book Search has a completely new interface that uses AJAX. Unlike before, you can read a book without clicking on the "next page" button. You can just use the scrollbar or the arrow keys, like in Adobe Reader. There are new options: zoom in, zoom out that increase / decrease the size of the text in a book, and there's even a full screen mode.

The table of contents is displayed in the right sidebar, so it's easy to go to another section. Searching inside a book is much faster, as the results are displayed without reloading the page.

Google offers for each book a dedicated page (like this one) where you can find the description, related books, references from books and scholarity works and some key terms that may help you discover other interesting books.

Now you can actually read books in Google Book Search (of course, if copyright laws allow you).