Pages

Tuesday, February 22, 2011

Part 2: Building a Library App

I recently read Nicholas Carr's article in the Atlantic - Is Google Making us Stupid? Despite the humorous connotation the article has also spawned a book that takes a very introspective look at the impact of the web on our brains. The premise is that the web is spawning a generation of knowledge surfers. Much like their channel surfing counterparts this has led to a level of diminished capacity for digesting knowledge beyond the cursory surface. Or so the premise goes.

The web has become a replacement for the complex information density traditionally found in books. Our flickering attention has been spread across a broad superficiality. In essence the web (and perhaps all modern media) has made it more difficult for us to read in a lengthy and contemplative way. We seem less able to concentrate, easily lose focus, and become fidgety (after only a few pages in my case).

This is a troubling trend for libraries. The expanding superficiality within our culture, first in news print, then on television, and now in the actual knowledge that we seek means less visitors to the library.

While the web provides a quick answer to any question it should represent only the beginning and not the end of the quest for knowledge. Unfortunately searching for real knowledge is not available so quickly or easily. Google Scholar is a great place to start (giving credit where credit is due) but is sadly incomplete. It can barely compare to even the smallest of public libraries as an information source. It is just sad to think, with all the potential resources available, that this is the only source for convenience sake that most students will ever use. But it doesn’t have to be this way.

Where to Start
At the ULA Conference the prevailing wisdom was to build catalogs like Google; keyword-text, associative search engines. But you're competing with Google for user interest. In this more competitive environment you need to be unique. You need to innovate. Look beyond Google to more cutting edge systems like Aardvark, Mahalo, Delicious, Stumbleupon, Digg and others. Google certainly is.

So where do we begin to build a library app for your iPad or iPhone? Well, to start it should reflect the values of your local library. Values of community and participation, of openness and learning. The app must capture that partnership between the town, city or university and the community. In that sense it must be different than traditional search.

A library is so much more than just a building. It is also its people; librarians, volunteers, and patrons. From my personal experience librarians have always been able to provide a recommendation or even point me toward a new and unexpected discovery. A library app must preserve this richness. But how?

Winning Hearts and Minds
Rather than lose the insight, the personal experience of a librarian, the world of eBook search should preserve it, expand upon it, and in the best spirit of the democratic web allow all members to have a voice in it. Why not offer more than just MARC records. Allow users to provide feedback, comments and ratings. Allow them to become contributors of the service.

When you search Amazon you get the typical vertical search results. But when you click on a title it takes you to the books own page. Here you can find a great deal of information about the book. They use this page to "sell" you on the book. It has the typical catalog information - title, author, date published, number of pages, ISBN, etc - but it also has a detailed publisher description, user reviews, even ratings. I find the other reader reviews to be fascinating and often a key to whether or not I buy the book. Most reviews are thoughtful, insightful and provide that shared knowledge about the book that I [used to] love when I visited my local library (in some ways it is better). It is critical that this type of input and feedback still be available when I search for books.

Shouldn't a library app offer the same richness? Shouldn't every eBook have its own page? Simply adding user comments and a rating system to your current digital catalog is missing the point.

Look at Wikipedia. Who would have thought that tens of thousands of online volunteer users could work together to build and ultimately make obsolete the traditional encyclopedia? Microsoft’s Encarta made Britannica obsolete, and within a decade Wikipedia made Encarta obsolete. There is a lesson here about competing with Google. Your users don't want flat, one-dimensional platforms that are cold, impersonal, even authoritarian. That's a recipe for failure. A library app must be fun, open, and participatory.

The community, your community, must have input in the process, a stake in building the platform with you. Like Wikipedia it must embrace the example of self-governing user participation. Let the users "own" it in a sense. That's how we feel about our libraries. And a library app, which at its core is a search engine to find books, must capture the same spirit as Wikipedia. Present a partnership with the community to help build and curate the search engine, lovingly create each eBook page, keep it current and informative. Maybe even, dare I say it, cutting edge.

Anything Else?
Jumper embraces both taxonomies and folksonomies. And believe that both are inherently valuable in their own way. While Jumper does provide an extensive method for classification we are not a semantic web tool. At least not yet. The reality is that most users have no idea what the Semantic Web is. I have toyed with including the ability for users to provide RDF (or OWL) attributes in Jumper using pull down menus with standard terminology but in numerous tests it has been met only with confusion and questions.

Everyone, however, gets tagging. Anyone can understand and use Delicious. And there is a real power in open folksonomies, even in scholarly research. Jumper grew out of the life sciences industry and the simple reality was the knowledge managers were not scientists. They worked with scientists to create taxonomies but often the science evolved faster than they could agree on terms.

Taxonomies are by nature rigid and unforgiving. Folksonomies allowed scientists to organize information based on their research and this freedom was widely embraced. And for a library app that aims to engage a young audience this freedom will be hugely popular. My own kids certainly have their own way of thinking about things that to be honest is almost another language to me at times You risk losing this audience if you don't allow this self-organizing, self-governing freedom.

One Last Feature
The hardest aspect to capture is the power of new discovery. Often times a librarian could point me to new material that I would never have found otherwise. This power of discovery is perhaps the most important to maintain. Allowing information to be linked together directly based on informal associations.

Users can provide these links. Much like the Amazon feature that states "Users who bought this book also bought..." Ok so you are not selling books and these Amazon references were often way off base. Jumper instead relies on user experience. A user who reads a book often has an interest in the subject and has read many books on the subject. Tapping into this experience allows for the type of new and unexpected discovery that will keep a library app fresh and exciting. A library app must allow for users to input links directly to new material.

You’re Role
Of course the most important thing is the role of librarians in this new library app. No library, even a digital library, can exist without some level of real governance. And that is the importance of the editor role in the Jumper library app. Any user can be a contributor. You may decide that only registered users with a library card can be contributors. Or you may open it up more broadly and allow simple online registration.

However you choose to provide access none of these user comments, ratings, edits, links or other contributions get published into the system until they are approved by an editor. By you the librarians. Unlike Wikipedia which can be defaced openly and then editors come after the fact and clean it up Jumper is different. We control spam by forcing all contributions to be approved by an editor before being published. This places the librarian at the center of this new web 2.0 library app, as the keeper of the system you play a vital role in managing contributions to the system.

The Final Analysis
In the poignant words of the playwright Richard Foreman are we forever doomed to become “pancake people—spread wide and thin as we connect with that vast network of information accessed by the mere touch of a button.”

The reality is that the 'pounding on my head' of information overload is not going to end any time soon. Simply staying the course, following the well worn grooves of the past will not make it go away. We must take bolder steps to create the type of digital world that we want. We must demand a corner of the web we can call our own. That reflects our shared values and beliefs. An app that can search beyond that smooth surface and deep into the dense, cathedral-like structures of knowledge available in our libraries.


<  Previously Read Part 1

Tuesday, February 15, 2011

Part 1: Building a Library App

So they are closing down libraries in the UK. I guess that comes as no surprise, but still a sad story. Branches have been closing in many towns and cities in the States for years. The library was once the sole entity bringing open knowledge into these communities. With the growth of the web that role has rapidly diminished.

The real driver diminishing the need for libraries as we know them is the advent of eBooks. It is only a matter of time before all books are digitized. And so beyond a public space in the community to bring people together (a great value by itself) what is the role of libraries going forward? How do they redefine their niche?

It is essential that institutions remain to provide open access to books and knowledge. To lose this in the digital age would be a great tragedy. Just because the need for the four walls of a library disappear the concept of what a library represents certainly should not.

The Mission

Here are a few mission statements of local libraries:

- The Howland Public Library provides materials and services to help community residents obtain information meeting their personal, educational and professional needs. Special emphasis is placed on supplying adults with current reading materials; on providing reference services to students (at all academic levels).

- The Mission of the Beekman Library shall be to assure effective, expanding, free library service for the community of Beekman and to lead citizens in anticipating their future needs for library services.

- It is the mission of The Alice Curtis Desmond and Hamilton Fish Library to provide access to the world of social and cultural ideas to the community by offering a wide variety of materials and programs. The Library has a special mission to young children and their parents to encourage a love of reading and learning.

The mission of the library is more important than ever in the modern web world. Web sites are rife with incomplete or worse completely misleading or slanted information. Is this the only type of information access we want to provide to our children?

Super Libraries

New super-libraries are not the answer. There are a number opening or planned to open in the UK such as the one in Birmingham. The description of the new library has a glass building wrapped in delicate metal filigree. Sounds more like a mall than a library. Should a library become more like a Borders or a Starbucks to survive? Maybe, until you realize that Borders is struggling and likely to go under. The victim of Amazon and the ever expanding online world.

As the world all around us changes why do we have such a hard time adapting our concepts from the past to this new world of the future. Why does a library have to have four walls at all?

Running a successful public library in the 21st century is tough. Foot traffic is down and book loans are massively down. In the UK only 14 of 151 local authorities have libraries that offer eBooks. Rather than investing in building these new monstrous libraries shouldn't the investment be geared toward digitizing libraries across the country and making them available online. Working with DRM providers to allow books to be checked out to an iPad or Netbook for three weeks before being removed. This serves all interested parties from publishers to libraries to readers. Libraries must "move with the times to stay part of the times" and if you care passionately about libraries and the mission of libraries then embracing the obvious future with a new goal and mission for libraries must strike a chord.

Digitizing Libraries


Many books today have been digitized. A significant portion of research is already digital. As the eBook initiative continues to build momentum in the scholarly community with the UPeC and UPSO it is only a matter of time before books are something we find in antique stores.

I am of a generation that loves the look and feel of books. But after watching my son lug 12 pounds of books in his backpack to school every day I am sure when the day comes he will not miss them one bit. I wait for the day when our local Charter school sends me the bill for a Kindle or iPad... The next generation simply has embraced an all digital world and lets face it there is really no looking back.

There is indeed an opportunity here. If the Birmingham library has 2.5M books stretching over seven floors at the disposal of residents all over the city imagine what a digital library could present. All the world’s libraries with billions of books available to every student on the globe with online access. Imagine putting that many books at the fingertips of every man, woman, and child in your community. If the goal of a library is to truly make knowledge available to the public then this new vision should be broadly embraced as rapidly as possible.

The Online Library

If we can contribute anything toward this inevitable revolution it should be how people use and interact with a digital library online. In a web 2.0 world this is a great opportunity to shape that future in a way that contributes to the knowledge of all participants. How will we search these vast repositories of digital libraries? How will we participate? That is the question we should really be asking ourselves.

Will advertising and commercial interests take over library research? Will pop-up ads for Halo become the norm when searching for books on Winston Churchill or more scholarly research for Tyrosine Phosphatase Receptors? I don't know about you but my kids are already exposed to enough. Commercialism and teaching are an uncomfortable mix. Access to the worlds libraries should remain unrestricted and commercial free. And Jumper can help you build a great app to find and share information resources without ceding more control over information access to ... (insert megalomaniacal privacy selling software company here). Whatever platform you choose a combination of open source tools published under the GPL would best serve the needs of research and public libraries as they strive to meet the digital challenge.

Computers are not the enemy of the library. They are its greatest opportunity. It seems only a matter of time before we move completely to an app driven world. The laptop, Windows, Web world we know today will be swept away. I already rely on apps for countless services instead of searching the web for this information. The Google portal will be the big sacrifice in this transition and with it a significant portion of their ad revenue. Don't weep for Google as they will be Apple's prime competitor with the Android platform.

If search becomes just another app then how will our use of search change? No doubt search will become increasingly specialized and segmented. Hmm a search app for each need and audience. OK so here is my pitch for a library app. Just because the need for the four walls of a library disappear the concept of what a library represents certainly should not. It is time to replace foot traffic with eyeballs just as the rest of the world is doing.


Read Part 2 Next  >

Saturday, January 29, 2011

Making Jumper 2.0 a Non-Profit Public Entity

Over the holidays it seems we never have enough time. The demands of life pile up around us; work, entertaining, parties, shopping, my relationship, the kids… the list goes on and on. It was over this holiday season that the demands of Jumper became a little too much to manage.

With my consulting career growing at an ever faster pace it was already difficult to manage new clients and keep up with the demands of running the Jumper project. Throw in all the holiday pressures and I had one of those moments. No not a break down, as any of you who talked to me during the holidays probably guessed… a moment of clarity – a hard realization.

Jumper needed to follow a new organizational path. It was not working as a commercial entity. Despite my Herculean efforts to wear every hat it was becoming impossible to keep up with the demands. The popularity of our software is growing rapidly, downloads continue unabated rapidly approaching 10,000, hits to the website continue to grow at over 300% for 2010, calls and emails for support are keeping pace with this growth. In fact, they seem to grow exponentially as I typically provide this support for free.

It was simply time to reorganize. To transition from a private to a public entity that could continue to promote the benefits of collaborative search and universal (as in any information) bookmarking that I believe in so much. It was time for Jumper to become a non-profit foundation that could begin to grow on its own. This will free-up the many community members to make their own ideas happen, give you control of the direction of the software, and of course allow you to actually put your participation formally on your resume. And yes I have done my share of reference letter in the last two years…

As a non-profit foundation we will invite community members to fill one of six board seats that will be open each year. This annual term allows a broad range of community members to participate over time, each adding their own unique contributions. If you would like to be a Jumper Foundation Director please reach out to me and let me know. Eventually each board member will be nominated and elected annually as is defined in the foundations charter, however, in the beginning as we get this up and going I will be appointing the first board members. As a board member you will have access to all of the Jumper forums; you will be an admin on Jumper Sourceforge and on the Jumper Developers Group, a blogger in the Jumper 2.0 blog, with access to post on the Jumper Twitter and Facebook pages.

Any person may apply to be a member of the project and be eligible to be a member of the Association under our new Rules. All affairs of the Jumper Association shall be managed by the Board. There shall be no designated officers of the Association. All nominated or elected board members shall govern in a round-table forum with decisions made only by majority vote. Any three members of the board meeting via conference call or online meeting constitute a quorum for the conduct of the business of a meeting of the Board. Pretty informal and relaxed and completely fitting with our beach front digs.

I have corresponded with many of the community members over the holidays before making this change and have received unanimous support in this new direction. The consensus has been that the Jumper project is about people producing free and open software and contributing to something as a team for the benefit of others. To quote some of the emails “the Jumper project reflects the spirit of collaboration and fun and thrives on strong community feedback”, “we need better governance that allows for diverse businesses and organizations to confidently invest in its use and further development”, “it is important that it remain open to the participation of anybody who can contribute value and is willing to work with the community.” All of these comments, reflected in numerous emails received from community members, express the goals that will be better served by organizing Jumper 2.0 as a non-profit public entity open to everyone around the globe.

Perhaps most importantly this change is aimed squarely at meeting the concerns of the core development teams. Many of you have come and gone from the project over the last two years expressing dissatisfaction about Jumper Networks commercial control of the software. You have felt you had no voice in its government or the future direction of Jumper 2.0. This was never my intention and I regret the arrogance of this thinking. The contributions of all the developers is vital and by changing our organization to empower our community, to cede control of the core development more fairly to all of the developers will allow their skills and expertise to lead the project forward in a new direction. As it should be.

Jumper Networks Inc will cease to exist. It will transition all its assets to the Jumper 2.0 Foundation whose charter will be published on our new website www.jumpersearch.com. We will continue to develop and improve this award-winning software project and ensure that it continues to be released under the GNU General Public License. I will of course remain a part of the development team going forward and continue to provide free support to anyone who emails or calls. But more importantly I look forward hearing from many of you who wish to take a more active and decisive role in the project.

Thanks,

Steve Perry

Tuesday, November 16, 2010

Connecting the Dots

Over the weekend I read Kevin Rivette's book “Rembrandts in the Attic,” which outlines the lost value buried in distributed documents, and what this underutilized intellectual property costs companies. A subject near and dear to my heart. But it wasn't until Sunday, when my 6th grade son asked me to review his paper on Francis Drake's journey in the south seas, that I connected the dots. As I read how they explored and discovered new islands and peoples. How they charted, documented, and mapped not only everything they found, but everywhere the went. That all their charts and maps got me thinking.

Why don't we do this for our information? We document the output of a hypothesis or experiment, capture the data, and if the project is abandoned or failed we file it away. Often forgotten. Explorers make maps to capture what they learn so that the next visitor can find where they have been and go a little farther, learn a little more, avoid the same mistakes. Why can't we see information the same way? Apply a few tags to provide the "lay of the land" as it were to an information asset. Capture the context, meaning and value of it. A simple step, yet one that can make all the difference in discovering and leveraging our forgotten assets.

We are drowning in data. Every year, Berkeley researchers tell us, we generate 30% more information every year. The sequencing of the human genome over the past decade has led research centers in both the private and public sectors to place huge orders for thousands of servers and storage systems capable of handling terabytes of the new genomic, proteomic, drug, and health care data generated hourly.

Privately, we all struggle with this issue each day. Finding the information we're looking for. Few industries suffer more from this data deluge than pharmaceuticals. Many gifted and well-paid scientists and engineers spend 15% of their time trolling through federated storage or file servers for the data or documents they need. Sometimes they never find them, triggering rework, redundant tests, and the loss of untold millions of dollars each year. Despite significant investments in information technology, knowledge-based pharma remains “knowledge poor” in its day-to-day
operations, at every step of the value chain, from discovery through distribution.

Big data has led to flexible storage solutions that scale massively, easily, and
relatively cheaply, if you call pay as you go cheap. However, while the storage
industry has met the challenge, pharmaceutical companies are realizing they are not making as much progress as they thought investing in genomics, proteomics, and
informatics research. They're not getting the returns on investment. It is the tumultuous world of bioinformatics that has not fully met the challenge of the genomics revolution-in-waiting.

But why are we still struggling to connect the dots?

The real challenge is that the research process itself still remains personally competitive, often isolated, and widely distributed. Information exists, but unconnected. At a surprising number of firms, R&D teams are literally re-inventing the wheel, duplicating research that the company has already done, whose lessons are buried in some obscure and forgotten file. Knowledge is generated and then abandoned when research leads in a different direction.

These assets, both the data and the knowledge remain just as isolated, distributed and unconnected. Dumped into bench-side databases or file servers. Even if they are effectively consolidated in a warehouse or content system they remain unconnected and without context. And the sheer growing volume of the data, papers, and images makes it increasingly difficult to find and discover a specific resource when you need it most. How we manage this information must change. And it must change before it is too late. We must change before it becomes impossible and costly to retroactively fix the error of our ways.

The knowledge exists about all of this information. These small “Rembrandts” exist everywhere. The day it is stored in a database or filed away in a digital landfill the person that created it, the project team that worked on it, and the admins that manage it have that knowledge. They know what it is, why it was created, how it was created and what was learned from it. Yet that knowledge quickly evaporates. People move on to other projects, get excited about something else, or leave the company. What we know about the informational context and value begins to fade - like all memory. And every day more and more of this knowledge is lost. These small “Rembrandts”, that the organization paid dearly for, are being lost every day because no one can find them. Even if someone was lucky enough to stumble upon the data or the file in a year or two they often cannot interpret it correctly, or put it in the right context necessary to maximize its value.

Think of how easy it would be to apply just a few tags to that data table to make it more findable. A small description, a little provenance information, a link to a few seemingly unrelated papers to provide the missing “context”. Informational threads, human insight and experience, provided by another scientist can make all the difference in the world. But this demands that we change the way we think about
information. We must view it not as an output of a project or hypothesis that was abandoned, but for what it really is... a learning process. Explorers make maps. Why don't researchers?

The visible world may be known, but the unseen world is just begining to be explored. Why don't we see information for what it really is? An output of the exploration. Applying just a few tags to capture the context, meaning and value of your work will make all the difference. And while that benefit may at first appear to be for someone else, like karma, it may perhaps one day benefit you.

Thursday, November 11, 2010

What is this thing called Personal Search

On a recent visit to a very large storage vendor I had a discussion about a social portal that they had developed. It was struggling to get user engagement and they were puzzled as to why.

My response seemed a surprise to them. Social software is personal. Users think of it as a personal tool. If you look at the most popular web 2.0 platforms like Facebook, Twitter, or even Delicious they are tools that provide a benefit to users on a personal level.

You cannot force user engagement. The tools either help them (so they use it), or they really don't provide much benefit (they don't use them). Facebook is about personal expression, Twitter about having a voice, and Delicious is about sharing your own interests. If tools don't deliver this personal benefit then users will not use them.

In a round about way this gets us back to the companies social portal. Enterprise 2.0 software suffers from this lack of personal, intimate interaction. It has a corporate aura about it, residing on the corporate portal, and workers don't feel the same personal connection with it. Does it really help them get their work done or just create more work for them? One more application they have to use.

As we discussed Jumper and how it might help I told them that if they deployed Jumper on the same corporate portal that it would likely suffer the same fate. The users would not feel any personal connection with it.

Jumper works best when it is deployed directly into a community of users. Smaller deployments that can be customized even personalized to users interests. I asked what groups had heavy information requirements and they mentioned the project managers, research lab teams, product development groups, etc. and I discussed that a customized Jumper deployed onto a small VM with minimal system resource requirements should be deployed for each of these groups. Users feel a more personal connection with a bookmarking engine when it is focused in this way. Search returns only the results relevant to them, not the whole company. Resources have been tagged by colleagues that they know and trust, basically their friends at work who they can holler over the cube wall at.

You have to change the way you think about applications. From the web to mobile phones applications are more specialized, more organic in their user communities. And large organizations need to understand and adapt to this expectation from their users. Search is no different.

Personal search or a point solution approach really means a more customized or tailored approach to search that meets the unique needs of its users. It is precisely because Jumper is an open tool that you can change to meet your own
unique requirements that this works so well. Precisely because it is license free and light-weight that it can easily be deployed in this way. The ability to reflect personal or group interests includes greater flexibility in the terminology or data dictionary to include a hybrid of corporate taxonomy and group based folksonomy. Specialist users have very specific and often highly technical terms that never make it in a formalized corporate taxonomy. Yet these terms matter to these specific users and make it easier to search and find things. Another critical factor is that the tag fields can be customized to meet the unique needs of users. For instance, with structured knowledge tags a materials engineer or biologist will have very different tagging needs than a SAN storage architect or a chemist. One might require a tag to identify the protein the other a tag to identify the compound, etc. A point solution approach allows the local bookmarking engine to be highly customized to meet these unique needs in a way a centralized system never could.

Personal search is not enterprise search. We understand that this tool must be simple. Easy to use, easy to navigate and intuitive. It must also provide a direct and immediate benefit to users. There must be something in it for them, it must not be a generalized tool, it must be very specific, even personal for each user to see and feel the value. It is more like a cube conversation. In this sense it must be localized. Sharing a common skill-set or job description, just as most users with similar skills are sitting together on the same floor and their conversations are based around this shared understanding so the tool must have the same level of intimacy.

This is what we mean by personal search and it requires an entirely new way of thinking about enterprise search. And it is often that thought process that is the hardest thing to change.

Thursday, August 19, 2010

Jumper in China?

I was browsing some web stats recently and happened to find a Jumper installation in China. Normally that would not be unusual. China is, after all, our second largest volume of traffic after the US. In fact, this was the third one that I have found in China this month. What was unusual is that it was not in some Chinese company I had never heard of, no, this one had a public IP address. It was on the public web in China!

This has been an increasing phenomenon over the last several months with public sites literally popping up all over the world (India, Poland, Estonia, Russia, Germany just this month). However, no one had yet posted one online in China. But yet there it was, a Jumper search engine in what I think is Mandarin, on the Internet inside China. Wow.

What did it mean? Was someone bypassing the government? It is light-weight and portable so that users could easily move it to another address when needed. Or was it simply small enough to fall under the governments radar? My head was spinning for a second...

It is really quite astonishing to me. This little software program has been nothing short of amazing since I first created it. Jumper started as a simple tagging engine to enrich metadata in a small project with a very limited budget. After the project I added a search page to it and posted it on Sourceforge thinking that was it.

I returned to the same life sciences company a few months later (on another consulting engagement) and was pleased to see the tagging engine was still integrated into their Intranet search. When I reached out to the original project team several told me, to my surprise, that they had since deployed the full Jumper 2.0 software in their department. When I asked why the answer surprised me. “If I know where to look I can usually find what I’m looking for - the problem is when I have no idea where to look, then it is almost impossible.” OK, so I paraphrased a little. The point being it was the discovery aspect of the software that they loved. Enterprise information is distributed. You need to know where to look. With Jumper they could find all kinds of information that they never knew existed. Tagging was merely a means to an end.

And now Jumper could bring down governments? OK so my imagination got a little carried away with the possibilities… But this I certainly never saw coming. Jumper has always been an enterprise search engine. I was fascinated at this new use of the software. When I inquired with one of these deployments what I found were users alienated from the traditional search model. Jumper gave them the tool to create a culturally friendly search engine. Created by users like themselves. One that met their unique interests. Lawyers in Estonia could create a search engine that met their culturally unique and local legal needs in a way no vertical or general search engine ever could. Scientists at a University in Germany could do the same, so could programmers in Russia, developers in India. The potential seems unlimited.

A new global economic and technical infrastructure is emerging, built on networked, social computing. In the next ten years a billion new people around the globe will gain a productive foothold in this economy and become an increasingly significant online force. They will be young and will look to do things differently. The old model of monolithic search provided by a few companies will no longer meet all of their needs. They will be culturally splintered, with vastly diverging interests, and will look for a more flexible search model that will better meet their unique needs. They will shatter the current search model into millions of pieces; culturally unique, community based, and socially oriented pieces.

From a simple project two years ago too an emerging global phenomenon? Well, perhaps not yet. We still have a long way to go, but things are starting to get very interesting.

Tuesday, August 3, 2010

Building Social into Solr

We have had a number of customers inquire about customizing specific aspects of Solr search with Jumper.

There are really two approaches: one is to build Jumper tagging into your search engine interface allowing users to tag documents or content when it is stored. The second is to import Jumper tagging fields into solr using the DataImportHandler. This is done using basic JDBC connectivity. Tags stored in the Jumper search engine then are imported into the Solr index and attached to a document and returned when searched. Using faceted_fields you can allow users to filter search based on the knowledge tags applied by other users.

This is perhaps the easiest method. The two services can be bundled in a single web interface. In this way you are removing the Jumper search engine and replacing it with Solr. This gives you the benefit of both worlds – full text searching and user tagging – to deliver better more detailed search results.

If you prefer to embed custom search paths into Solr the primary method is using facet-fields. A Jumper tagging interface can be added when storing documents. The Jumper tag fields are then stored as facet_fields that Solr will search in addition to its full text parsing of the document. This is done on indexed rather than stored values.

This requires that we add a number of Jumper tags to the Solr index separately and add a custom sort to Solr search. Adding a new Jumper tag field to the search results requires two very small hook implementations: hook_apachesolr_update_index() and hook_apachesolr_modify_query(). To start, let’s just add the keyword tag field to the Solr index.

/**
* Implementation of hook_apachesolr_update_index()
*/
function mymodule_apachesolr_update_index(&$document, $node) {
// Index field_keyword_tag as a separate field
if ($node->type == 'profile') {
$user = user_load(array('uid' => $node->uid));
$document->setMultiValue('sm_field_keyword_tag', $user->tags);
}
elseif (count($node->field_keyword_tag)) {
foreach ($node->field_keyword_tag AS $keyword) {
$document->setMultiValue('sm_field_keyword_tag', $keyword['filepath']);
}
}
}

All we do is add the data to the index by adding it to the $document object, which is passed by reference. We used the setMultiValue method since the tag field can have multiple values, but if we were just adding one field, we would just use the addField method. The field name is simply the 'sm_' dynamic field name pattern with field_keyword_tag appended, since the field contains a keyword string, and the sm_ field type represents a small string.

Now that the data has been added to the index, we also need to add it to the query so it can be returned with the search results:
function mymodule_apachesolr_modify_query(&$query, &$params, $caller) {
$params['fl'] .= ',sm_field_keyword_tag';
}

And that's all there is to it… This can be repeated for each of the Jumper knowledge tags that you want to add. All you're doing is some basic PHP string concatenation and appending your newly indexed field to the fields to return array (['fl'])of the $params object. Although, we are simplifying the detail a little bit on the format of $params for the sake of brevity in this post.

In general, adding Jumper social tagging features into your Solr search is pretty easy, and can deliver some very powerful capabilities to your search functionality.