Pages

Tuesday, November 16, 2010

Connecting the Dots

Over the weekend I read Kevin Rivette's book “Rembrandts in the Attic,” which outlines the lost value buried in distributed documents, and what this underutilized intellectual property costs companies. A subject near and dear to my heart. But it wasn't until Sunday, when my 6th grade son asked me to review his paper on Francis Drake's journey in the south seas, that I connected the dots. As I read how they explored and discovered new islands and peoples. How they charted, documented, and mapped not only everything they found, but everywhere the went. That all their charts and maps got me thinking.

Why don't we do this for our information? We document the output of a hypothesis or experiment, capture the data, and if the project is abandoned or failed we file it away. Often forgotten. Explorers make maps to capture what they learn so that the next visitor can find where they have been and go a little farther, learn a little more, avoid the same mistakes. Why can't we see information the same way? Apply a few tags to provide the "lay of the land" as it were to an information asset. Capture the context, meaning and value of it. A simple step, yet one that can make all the difference in discovering and leveraging our forgotten assets.

We are drowning in data. Every year, Berkeley researchers tell us, we generate 30% more information every year. The sequencing of the human genome over the past decade has led research centers in both the private and public sectors to place huge orders for thousands of servers and storage systems capable of handling terabytes of the new genomic, proteomic, drug, and health care data generated hourly.

Privately, we all struggle with this issue each day. Finding the information we're looking for. Few industries suffer more from this data deluge than pharmaceuticals. Many gifted and well-paid scientists and engineers spend 15% of their time trolling through federated storage or file servers for the data or documents they need. Sometimes they never find them, triggering rework, redundant tests, and the loss of untold millions of dollars each year. Despite significant investments in information technology, knowledge-based pharma remains “knowledge poor” in its day-to-day
operations, at every step of the value chain, from discovery through distribution.

Big data has led to flexible storage solutions that scale massively, easily, and
relatively cheaply, if you call pay as you go cheap. However, while the storage
industry has met the challenge, pharmaceutical companies are realizing they are not making as much progress as they thought investing in genomics, proteomics, and
informatics research. They're not getting the returns on investment. It is the tumultuous world of bioinformatics that has not fully met the challenge of the genomics revolution-in-waiting.

But why are we still struggling to connect the dots?

The real challenge is that the research process itself still remains personally competitive, often isolated, and widely distributed. Information exists, but unconnected. At a surprising number of firms, R&D teams are literally re-inventing the wheel, duplicating research that the company has already done, whose lessons are buried in some obscure and forgotten file. Knowledge is generated and then abandoned when research leads in a different direction.

These assets, both the data and the knowledge remain just as isolated, distributed and unconnected. Dumped into bench-side databases or file servers. Even if they are effectively consolidated in a warehouse or content system they remain unconnected and without context. And the sheer growing volume of the data, papers, and images makes it increasingly difficult to find and discover a specific resource when you need it most. How we manage this information must change. And it must change before it is too late. We must change before it becomes impossible and costly to retroactively fix the error of our ways.

The knowledge exists about all of this information. These small “Rembrandts” exist everywhere. The day it is stored in a database or filed away in a digital landfill the person that created it, the project team that worked on it, and the admins that manage it have that knowledge. They know what it is, why it was created, how it was created and what was learned from it. Yet that knowledge quickly evaporates. People move on to other projects, get excited about something else, or leave the company. What we know about the informational context and value begins to fade - like all memory. And every day more and more of this knowledge is lost. These small “Rembrandts”, that the organization paid dearly for, are being lost every day because no one can find them. Even if someone was lucky enough to stumble upon the data or the file in a year or two they often cannot interpret it correctly, or put it in the right context necessary to maximize its value.

Think of how easy it would be to apply just a few tags to that data table to make it more findable. A small description, a little provenance information, a link to a few seemingly unrelated papers to provide the missing “context”. Informational threads, human insight and experience, provided by another scientist can make all the difference in the world. But this demands that we change the way we think about
information. We must view it not as an output of a project or hypothesis that was abandoned, but for what it really is... a learning process. Explorers make maps. Why don't researchers?

The visible world may be known, but the unseen world is just begining to be explored. Why don't we see information for what it really is? An output of the exploration. Applying just a few tags to capture the context, meaning and value of your work will make all the difference. And while that benefit may at first appear to be for someone else, like karma, it may perhaps one day benefit you.

Thursday, November 11, 2010

What is this thing called Personal Search

On a recent visit to a very large storage vendor I had a discussion about a social portal that they had developed. It was struggling to get user engagement and they were puzzled as to why.

My response seemed a surprise to them. Social software is personal. Users think of it as a personal tool. If you look at the most popular web 2.0 platforms like Facebook, Twitter, or even Delicious they are tools that provide a benefit to users on a personal level.

You cannot force user engagement. The tools either help them (so they use it), or they really don't provide much benefit (they don't use them). Facebook is about personal expression, Twitter about having a voice, and Delicious is about sharing your own interests. If tools don't deliver this personal benefit then users will not use them.

In a round about way this gets us back to the companies social portal. Enterprise 2.0 software suffers from this lack of personal, intimate interaction. It has a corporate aura about it, residing on the corporate portal, and workers don't feel the same personal connection with it. Does it really help them get their work done or just create more work for them? One more application they have to use.

As we discussed Jumper and how it might help I told them that if they deployed Jumper on the same corporate portal that it would likely suffer the same fate. The users would not feel any personal connection with it.

Jumper works best when it is deployed directly into a community of users. Smaller deployments that can be customized even personalized to users interests. I asked what groups had heavy information requirements and they mentioned the project managers, research lab teams, product development groups, etc. and I discussed that a customized Jumper deployed onto a small VM with minimal system resource requirements should be deployed for each of these groups. Users feel a more personal connection with a bookmarking engine when it is focused in this way. Search returns only the results relevant to them, not the whole company. Resources have been tagged by colleagues that they know and trust, basically their friends at work who they can holler over the cube wall at.

You have to change the way you think about applications. From the web to mobile phones applications are more specialized, more organic in their user communities. And large organizations need to understand and adapt to this expectation from their users. Search is no different.

Personal search or a point solution approach really means a more customized or tailored approach to search that meets the unique needs of its users. It is precisely because Jumper is an open tool that you can change to meet your own
unique requirements that this works so well. Precisely because it is license free and light-weight that it can easily be deployed in this way. The ability to reflect personal or group interests includes greater flexibility in the terminology or data dictionary to include a hybrid of corporate taxonomy and group based folksonomy. Specialist users have very specific and often highly technical terms that never make it in a formalized corporate taxonomy. Yet these terms matter to these specific users and make it easier to search and find things. Another critical factor is that the tag fields can be customized to meet the unique needs of users. For instance, with structured knowledge tags a materials engineer or biologist will have very different tagging needs than a SAN storage architect or a chemist. One might require a tag to identify the protein the other a tag to identify the compound, etc. A point solution approach allows the local bookmarking engine to be highly customized to meet these unique needs in a way a centralized system never could.

Personal search is not enterprise search. We understand that this tool must be simple. Easy to use, easy to navigate and intuitive. It must also provide a direct and immediate benefit to users. There must be something in it for them, it must not be a generalized tool, it must be very specific, even personal for each user to see and feel the value. It is more like a cube conversation. In this sense it must be localized. Sharing a common skill-set or job description, just as most users with similar skills are sitting together on the same floor and their conversations are based around this shared understanding so the tool must have the same level of intimacy.

This is what we mean by personal search and it requires an entirely new way of thinking about enterprise search. And it is often that thought process that is the hardest thing to change.