Is Search a Solved Problem?

September 1, 2012

“Is search a solved problem?”  This question came up at the industry panel of SIGIR 2012.  However, most of the panelists evaded the question in its most strict sense, and pointed at new search problems or new relevance criteria such as the combination of relevance and recency.  In its most strict sense, this question should be whether full text ad hoc retrieval is a solved problem.  Other ways of asking this question include “What is the next big thing in retrieval modeling?”, “How can we consistently and substantially improve over the basic tf.idf retrieval models?”, “Why haven’t retrieval models improved over BM25 for the last almost 20 years?”, “How can one improve over Google at basic full-text Web search?”

I believe my dissertation on the term mismatch problem provides a satisfactory answer for many of these similar questions.

The first half of this article comments on the questions and common mis-conceptions that people have about retrieval modeling, and show how they can be answered based on the new understanding of the term mismatch problem and of its relationship to the tf.idf retrieval models.  The second half of this article directly cites paragraphs from the introduction chapter of my dissertation, since these paragraphs directly address the question “is search a solved problem”.  The interested readers can explore the full dissertation.

In IR research, there is a mis-conception or mis-perception about the limitations of the current retrieval models.  For example, in the Salton lecture of SIGIR 2012, Norbert Fuhr focused on the term independence assumption made by the Binary Independence Model (BIM).  However, what people often overlook is that another assumption is made when applying the BIM in ad hoc retrieval, e.g. in Okapi BM25.  It is commonly assumed that the probability of a term t occurring in the relevant set of documents R, P(t|R), is 0.5.  This assumption is made because in ad hoc retrieval the relevant set of documents for a query is typically unknown beforehand.  This assumption cripples the BIM by removing the relevance variable completely from the full model.  My dissertation shows evidence that in many cases it creates a much more significant problem than the term independence assumption does.  My dissertation also shows that this probability is directly related to the term mismatch problem, and that solving the term mismatch problem could improve retrieval by over 50-300%.  This means the mismatch problem happens deep down at the retrieval model level, and that it is a significant problem with a huge potential.

Because of neglecting the mismatch problem, when asked where can search be improved, people often resort to something else.  For example, in the Salton lecture, Norbert Fuhr mentioned that IR is about vagueness, which I agree, but then resorted to some sort of user interaction to clarify the information need, and stepped outside the realm of retrieval modeling.  Even within the scope of retrieval modeling, even when the search request is clearly described, like most of the TREC queries, search is still not a solved problem.  How can search be solved, when we are only getting a search accuracy of around 0.2 to 0.5 in standard evaluations?  In the industry panel of SIGIR 2012, Diane Kelly, like Norbert Fuhr, also pointed to user interactions when answering the title question.  Krishna Gade from Twitter changed the relevance criterion to consider recency along with relevance, which is just another way to avoid the question “is search a solved problem” in its original sense.  Jerome Pesenti mentioned some cases of enterprise search, such as facilitating the user in completing a task that uses search as a component, which also stepped outside search strictly defined.  I am not 100% sure whether it was Trystan Upstill who mentioned structured retrieval or semantic retrieval, but structured retrieval is more of a tool than a problem.  Furthermore, strict structured matching between queries and answer texts, e.g. phrases or using parsing structure of the query, worsens the mismatch problem.

Here, I am not trying to argue that retrieval modeling is the only problem in search.  These other directions are certainly valid and valuable research directions, and I am aware that search is only part of a bigger picture of decision making or task completion.  However, the retrieval model is the most fundamental component underlying a search system, and it affects any system or task that uses search as a component.  Fundamental limitations of the retrieval models would affect system performance, would affect the behavior of the retrieval techniques that work on top of the retrieval model, as well as the behavior of the search users.  This is true everywhere search is executed.  Because of that, understandings of the fundamentals should always be pursued with a higher level of enthusiasm than anything peripheral.  Now that we know that mismatch is an important problem in retrieval modeling, there is every reason to understand and to solve this term mismatch problem.

Of all the industry panelists, Steve Robertson’s point about searching on small collections such as desktop and some enterprise search is perhaps the only one about the core retrieval task.  More generally, a small collection typically implies a case of recall-oriented search, where an emphasis is given to finding all relevant documents in the collection.  In the case of desktop search, usually there is only one document that the user wants.  In the case of legal discovery, patent search or bio/medical retrieval, the cost of missing a relevant document can be very high.  Now that we know that mismatch is a fundamental limitation of the retrieval models, it is perhaps not so surprising that the current retrieval systems are not doing well on these recall oriented tasks.  Thus, underlying the small-collection search problem is the term mismatch problem.  If Bruce Croft were on the panel, he would have probably mentioned long queries, which is also the term mismatch in disguise, because the longer the query gets the more likely the query terms will mismatch relevant documents, and the more likely retrieval accuracy will be hurt.

On large collections like the Web, is search solved?  Definitely not.  Searchers doing informational searches are still constantly frustrated by the major Web search engines (see [Feild, Allan and Jones 2010]), and perhaps mismatch is one of the major problems causing the frustration.  Even for precision-oriented search, my dissertation shows that successfully solving the mismatch problem can substantially improve precision as well, and it is still valuable to solve mismatch.  Are there problems other than mismatch?  Yes, definitely.  Sometimes, precision can also be a problem.  However, a mismatch infected query may look like a precision problem to the untrained eye, and a wrong diagnosis may cost the searcher a lot of time trying to fix the query in futile ways.  Are there other problems?  Maybe, but that has to be carefully investigated, clearly defined and connected back to the fundamentals.

—— The following text is cited from my dissertation ——

Web search engines like Google have popularized the use of search technology to the extent that search has become part of our daily life. These Web search engines have also made search so easy to use that an ordinary search user would be satisfied with the results from the search engine most of the times. Some would even think that search is a solved problem.

This dissertation argues that search is far from being solved. As long as machines cannot perfectly understand human language as humans do, search is an unsolved problem. Some may still disagree, thinking that although it’s probably true that machines still cannot perfectly understand human language, it is search that demonstrated that a natural language processing task can be tremendously successful even with very simple algorithms and easy to compute statistics (Singhal 2001). In fact, attempts to use more complex natural language processing techniques in retrieval have mostly been proved futile. Even people working intimately with retrieval technology have the impression that the research on basic retrieval algorithms is hitting a plateau. Some may even think that there is probably not much room for improvement. After all, the retrieval models of the 1990s and early 2000s (Okapi BM25 or Statistical Language Models) are still the standard baselines to compare to, and the go-to models of the modern day retrieval systems (Manning et al. 2008, Chapters 6 and 7).

We argue that search ranking, retrieval modeling to be specific, is far from being solved.

Firstly, the success of Web search engines is largely due to the vast and diverse set of high quality documents to retrieve from. Because of the diverse set of relevant documents out there, even if a search query is not well formulated, even if the retrieval algorithm cannot return most of the relevant documents in the collection, a few good ones will match the query and be ranked at the top of the rank list to satisfy the user. Just that these search engines can return satisfying results for most queries does not necessarily mean that the retrieval models used by these systems are successful, and the impression that the simple retrieval models are successful can be just an illusion created by the successful collection of a huge set of documents for the retrieval algorithm to search against.

Secondly, in cases where the document collection is small, where the set of documents relevant to the query is small, or where a high level of retrieval recall is needed, the current retrieval systems are still far from being satisfactory. For example, in perhaps all forms of desktop search and some forms of enterprise search, the document collection is much less diverse and much smaller than the Web, and the users can still be easily frustrated by the search systems. Even in Web search, for informational searches, users are still frequently frustrated by the current search engines (Feild et al. 2010). In legal discovery, the lawyers from both sides of the litigation care a lot about not missing any potentially relevant document, and usually spend lots of time on carefully creating effective search queries to improve search effectiveness.

Some may still ask, if search is not yet solved, why are the baseline retrieval models so difficult to surpass, and where can we see any large improvements? We show in this dissertation that two central and long standing problems in retrieval, vocabulary mismatch (Furnas et al. 1987) and relevance based term weighting (Croft and Harper 1979; Greiff 1998; Metzler 2008), might be the culprit. We show that the two problems are directly related, the vocabulary mismatch problem being the more general version of the two. We show that the current retrieval models do not effectively model the vocabulary mismatch between query terms and relevant results. We show that term mismatch is a very common problem in search, and that a large potential gain is possible. We demonstrate several initial successes in addressing term mismatch in retrieval using novel prediction methods and theoretically motivated retrieval techniques. These techniques can automatically improve the retrieval system by making it mismatch-aware. Ordinary search users can also manually apply the query expansion technique studied in this work to further reduce mismatch and increase the effectiveness of their searches.


It takes a scientist to take good care of a newborn

April 13, 2012

Baby cries, parents need to diagnose the problem, form hypothesis of what might be wrong, do some testing to check if that’s really the case, and solve the problem.  A wrong diagnosis can only make a baby more upset, or disrupt the routine that is being established with the baby.

Baby doing everything, parents might want to log down all the activities including the surrounding environment.  This data can be used to identify outliers, or changes in the baby, allowing parents to respond more promptly.  For example, baby is eating more, baby is having stomachaches more frequently, baby is sleeping less when it gets warmer, some of these can be perfectly normal, some can mean a problem.

Luckily, newborns have a limited memory and are just a combination of simple reflexes.  Wondering what the scientist can do when babies grow up, the problems become more complex, and many factors begin to interact with each other.  Maybe the artist side of the scientist or the engineer side would need to take over now and then.


Superstitious models

January 18, 2011

Superstition is a form of overfitting in human learning.

It’s likely to happen when the human has no clue or no good clue of what’s going on.  Here, all kinds of noisy clues can become important for the human to predict what’s going to happen.

Similarly, a machine learning model is likely to get superstitious when no good clues are provided.  This is probably the most severe form of overfitting, and the model can even perform worse than a random baseline or a constant guess.

Unfortunately, this is happening to my research experiment now.  I know that there are good features out there, but they are much more expensive to compute, and I don’t want to get to use them.

I guess the only way out is to try to see if it’s possible to get some simple but reasonably good features.

Superstitious models..



June 15, 2007