2. Finding & Evaluating Information

2.2. What’s a Search Engine?

Let's start by looking at how a search engine actually works. This is slightly technical, but if we don't understand how a tool works, we can't evaluate the work it does for us, or improve it. And at that point, we're not using the tool — it's using us. We'll try to keep it basic. And while this explanation uses web searching as its example, keyword searching works the same way in a library database as it does on the open web.

Indexes and what's in them

When you search the web, you are not actually searching the web. You are searching a giant index of the web that a company has made. This is true of a library database as well: you're searching an index of the database, not the records themselves. That's because to search the database itself would take far too much time. A search index serves the same purpose as the index of a book. Imagine if you had to scan every page of a thick reference book one by one to find the topic you were looking for!

A search index is far more thorough than the index in the back of a book, though. To generate its index, a search company uses computer programs called “spiders” to “crawl” the web, following one link and then another, to find new pages. The spider then takes in the full content of those pages and breaks them into words and phrases, storing not only information about what words are on each page, but where those words appear, how close words appear to one another, and how important they seem to be. Then, when you type in a search query, the search engine compares your query with its index.

What does this mean for you?

First, a spider can find only pages that are linked from other pages whose location it knows. If a web page isn't linked from anywhere, it probably can't be found. Millions — perhaps billions — of pages on the web can't be found this way. (The records in most library databases can't). Nor can spiders find pages that are protected by passwords. So, in fact, a search engine can only help you find a fraction of the pages on the web.

[DIAGRAM – use the spider and web as metaphor]

Second, each index works differently. Exactly how any given company indexes the web is a closely guarded secret. Even common databases have their own ways of indexing information. This means that, for example, Google may decide that the placement of the phrase “digital literacy” on a given page is more important than Bing thinks it is, or that Yahoo may recognize that “literacy goes digital” has the same words in a different order and consider it equivalent, but Ask.com doesn't.

Rankings

A web search for “digital literacy” may return millions of results. How does a search engine decide which to show you first?

The first factor has to do with the content of the page itself — how closely it matches your query, how many times those words appear, and where the words appear on the page. If the title of the page contains the exact phrase you searched for, it's likely to be ranked high. If your query appears exactly as you typed it several times on the page, that will also boost the page's ranking. If the words you typed are found once each scattered around the page, the page will appear in your results, but won't be ranked as high.

Second, search programmers try to find ways that machines can emulate the human process of evaluating information, so that your search results aren't filled with junk. Google's PageRank algorithm, for example, evalutes pages based in part on the number and kind of incoming links. If lots of other pages link to a page, that page is judged more reliable and therefore ranked higher — especially if the pages linking to it are themselves linked to by lots of other pages, and so on.

Learn more

In this video, Google gives a pretty good explanation of how its search works.

Wait — what was that he said about advertising? We'll come back to that a little later [LINK to Business of Search]. First, let's look at how to build an effective search query.