Integrating search in your site - Yahoo Boss and Linq to XML
For quite some time the search bar on my sites have used an old XML Web Service based API from Google. It worked perfectly and provided a provider agnostic set of search result for embedding in the site. Unfortunately, Google pulled the plug on their free SOAP API in favour of their AJAX/REST/Advert laden interfaces. This only became apparent when I was showing someone the search functionality in the site. So the hunt was on for an alternative.
Requirements
There were only a few requirements for the alternative provider:
- I didn't want to write my own site search
- The search only needed to work within my own site(s)
- The user interface must be completely under my control (i.e no change from the previous version)
- The results should be in a form that could be integrated server side into my own custom format.
- No JavaScript - accessibility is key!
- No advertising!
A brief search (sic!) revealed just the ticket - Yahoo Boss
How it works
The concept is very simple. You access a RESTful URL with your search term in it and a query string to set additional parameters. The service returns the search results in a selected format.
It currently supports two formats: JSON (which is a non-starter as server side parsing is much more complex in an ASP.NET scenario) and XML.
The service is only available if your site is registered with Yahoo, and a key is provided which is tied to your domain. The basic url format (all on one line) is:
http://boss.yahooapis.com/ysearch/web/v1/"bonfire night"?
appid=yourkeygoeshere&
format=xml&start=1&count=10&sites=www.yourdomain.com
The above url will return the top ten results from domain www.yourdomain.com for the search terms "bonfire night" in XML format.
The xml result contains a collection of <result> nodes, each of which contains the page url, it's title and a brief snippet from the page along with other information.
Crafting a solution
Assume we have a page with a textbox (id="q") and a button (search) linked to your search results page. Click the search button will send the text value in the query string. We can retrieve the query string- and sanitise it - as follows:
string search = Server.HtmlEncode(Request.QueryString["q"]);
We can set up a URL string which we can use to format the URL:
string searchFormatString =
"http://boss.yahooapis.com/ysearch/web/v1/\"{0}\"?
appid=yourkeygoeshere&
format=xml&start=1&count=10&sites=www.yourdomain.com";
The System.Xml.Linq namespace in ASP.NET 3.5sp1 provides a queryable interface for working with XML documents. The XDocument class has a static method which will return an XDocument object from it's URI. So, retrieving the search results is a simple as using the following statement.
XDocument results =
XDocument.Load(string.Format(searchFormatString, search));
Parsing the return result set
This seems quite straightforward using LINQ to XML, until you actually try it an wonder why the result set is alway empty, even though the xml document contains a valid set of results. A bit of debugging revealed that the name you provide to the Element and Descendent methods of the LINQ to XML documents/elements are preceded by the namespace of the document. My code which attempts to access all the Descendents of the document with name "result" didn't work, until I used the name "{http://www.inktome.com/}result" and likewise for all the other elements.
The LINQ query to return the set of results into a convenient list of data is:
var q =
from r in results.Descendants("{http://www.inktomi.com/}result")
select new
{
url = r.Element("{http://www.inktomi.com/}url").Value,
title = r.Element("{http://www.inktomi.com/}title").Value,
snippet = r.Element("{http://www.inktomi.com/}abstract").Value
};
We can then iterate through the results and format them in whatever way we like. For example:
foreach (var result in q)
SearchResults.Text += string.Format(
"<p><a href=\"{0}\">{1}</a></p><blockquote><p>{2}</p></blockquote>",
result.url, result.title, result.snippet);
Summary
The above example demonstrates the power of LINQ to XML for the parsing of an XML document, and also gives you an idea of how I managed to get out of the hole that Google left me in by pulling the plug on their SOAP Search API.
Note: the basic Yahoo BOSS search is limited to 5000 searches per day.