It was Steve Arnold who fathered the ABI Inform taxonomy. And we thank him. In it he created two telecommunications categories -- one as an industry and another as an operational function. And we owe him big time. Why?
Arnold understood that taxonomies had to be strongest where search is weakest -- separating objects from predicates, actions from actors, the simple mechanics of grammar. His decision to isolate industries from operations was an elegant way to map verticals and horizontals within what the pre-web world used to call full-text databases.
So now we've got big, messy data sets whose only barrier to entry is hitting the key. How do we create clarity around the events we anticipate? How do we increase the certainty that we can identify who is driving those events and who's on the receiving end?
The framework I use for teaching this to my BU students is through "word algebra." In word algebra the Boolean construct would be:
actor1 OR actor2 "(verb1 OR verb2)(outcome1 OR outcome2)"Plug this into Google as a future crime fighter and you get:
police OR cops "(taking OR took OR take)*(bribe OR bribes)"The beauty of the formula is that you're tuning your results to the consequence of actions. This is nearly always a more interesting result than a topic-based query string consisting of keywords and subject classifications.
Better still, consider an entire index of query-based strings that chronicle a common set of outcomes -- say police corruption. The result is a formula more productive as an indexing agent for information-gathering than any static taxonomy related to criminal justice.
No comments:
Post a Comment