Pages

Thursday, August 19, 2010

Jumper in China?

I was browsing some web stats recently and happened to find a Jumper installation in China. Normally that would not be unusual. China is, after all, our second largest volume of traffic after the US. In fact, this was the third one that I have found in China this month. What was unusual is that it was not in some Chinese company I had never heard of, no, this one had a public IP address. It was on the public web in China!

This has been an increasing phenomenon over the last several months with public sites literally popping up all over the world (India, Poland, Estonia, Russia, Germany just this month). However, no one had yet posted one online in China. But yet there it was, a Jumper search engine in what I think is Mandarin, on the Internet inside China. Wow.

What did it mean? Was someone bypassing the government? It is light-weight and portable so that users could easily move it to another address when needed. Or was it simply small enough to fall under the governments radar? My head was spinning for a second...

It is really quite astonishing to me. This little software program has been nothing short of amazing since I first created it. Jumper started as a simple tagging engine to enrich metadata in a small project with a very limited budget. After the project I added a search page to it and posted it on Sourceforge thinking that was it.

I returned to the same life sciences company a few months later (on another consulting engagement) and was pleased to see the tagging engine was still integrated into their Intranet search. When I reached out to the original project team several told me, to my surprise, that they had since deployed the full Jumper 2.0 software in their department. When I asked why the answer surprised me. “If I know where to look I can usually find what I’m looking for - the problem is when I have no idea where to look, then it is almost impossible.” OK, so I paraphrased a little. The point being it was the discovery aspect of the software that they loved. Enterprise information is distributed. You need to know where to look. With Jumper they could find all kinds of information that they never knew existed. Tagging was merely a means to an end.

And now Jumper could bring down governments? OK so my imagination got a little carried away with the possibilities… But this I certainly never saw coming. Jumper has always been an enterprise search engine. I was fascinated at this new use of the software. When I inquired with one of these deployments what I found were users alienated from the traditional search model. Jumper gave them the tool to create a culturally friendly search engine. Created by users like themselves. One that met their unique interests. Lawyers in Estonia could create a search engine that met their culturally unique and local legal needs in a way no vertical or general search engine ever could. Scientists at a University in Germany could do the same, so could programmers in Russia, developers in India. The potential seems unlimited.

A new global economic and technical infrastructure is emerging, built on networked, social computing. In the next ten years a billion new people around the globe will gain a productive foothold in this economy and become an increasingly significant online force. They will be young and will look to do things differently. The old model of monolithic search provided by a few companies will no longer meet all of their needs. They will be culturally splintered, with vastly diverging interests, and will look for a more flexible search model that will better meet their unique needs. They will shatter the current search model into millions of pieces; culturally unique, community based, and socially oriented pieces.

From a simple project two years ago too an emerging global phenomenon? Well, perhaps not yet. We still have a long way to go, but things are starting to get very interesting.

Tuesday, August 3, 2010

Building Social into Solr

We have had a number of customers inquire about customizing specific aspects of Solr search with Jumper.

There are really two approaches: one is to build Jumper tagging into your search engine interface allowing users to tag documents or content when it is stored. The second is to import Jumper tagging fields into solr using the DataImportHandler. This is done using basic JDBC connectivity. Tags stored in the Jumper search engine then are imported into the Solr index and attached to a document and returned when searched. Using faceted_fields you can allow users to filter search based on the knowledge tags applied by other users.

This is perhaps the easiest method. The two services can be bundled in a single web interface. In this way you are removing the Jumper search engine and replacing it with Solr. This gives you the benefit of both worlds – full text searching and user tagging – to deliver better more detailed search results.

If you prefer to embed custom search paths into Solr the primary method is using facet-fields. A Jumper tagging interface can be added when storing documents. The Jumper tag fields are then stored as facet_fields that Solr will search in addition to its full text parsing of the document. This is done on indexed rather than stored values.

This requires that we add a number of Jumper tags to the Solr index separately and add a custom sort to Solr search. Adding a new Jumper tag field to the search results requires two very small hook implementations: hook_apachesolr_update_index() and hook_apachesolr_modify_query(). To start, let’s just add the keyword tag field to the Solr index.

/**
* Implementation of hook_apachesolr_update_index()
*/
function mymodule_apachesolr_update_index(&$document, $node) {
// Index field_keyword_tag as a separate field
if ($node->type == 'profile') {
$user = user_load(array('uid' => $node->uid));
$document->setMultiValue('sm_field_keyword_tag', $user->tags);
}
elseif (count($node->field_keyword_tag)) {
foreach ($node->field_keyword_tag AS $keyword) {
$document->setMultiValue('sm_field_keyword_tag', $keyword['filepath']);
}
}
}

All we do is add the data to the index by adding it to the $document object, which is passed by reference. We used the setMultiValue method since the tag field can have multiple values, but if we were just adding one field, we would just use the addField method. The field name is simply the 'sm_' dynamic field name pattern with field_keyword_tag appended, since the field contains a keyword string, and the sm_ field type represents a small string.

Now that the data has been added to the index, we also need to add it to the query so it can be returned with the search results:
function mymodule_apachesolr_modify_query(&$query, &$params, $caller) {
$params['fl'] .= ',sm_field_keyword_tag';
}

And that's all there is to it… This can be repeated for each of the Jumper knowledge tags that you want to add. All you're doing is some basic PHP string concatenation and appending your newly indexed field to the fields to return array (['fl'])of the $params object. Although, we are simplifying the detail a little bit on the format of $params for the sake of brevity in this post.

In general, adding Jumper social tagging features into your Solr search is pretty easy, and can deliver some very powerful capabilities to your search functionality.