View Single Post
Old Jun 12th, 2005, 7:20 PM   #6
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
If one has a data set that can be described (or closely approximated) by a function, then an interpolative search using that function should be very effective, since the initial "guess" zeroes in rapidly regardless of where the "needle" is in the "haystack." A binary search repeatedly tosses away half the set on each iteration, which means if the "needle" is near the extremes, the search will take longer than if it's near the center (although it's still very effective).

This rough little program deals with a linear data set. The same gains should hold true for any data set which can be approximated by a function, e.g. exponential, or whatever.

There are three functions available for filling the set. They are not operator selectable, though that could easily be added. I just alter the source code and go. One function, "linearFill," results in an essentially perfect linearity. A second function, "perturbedFill," perturbs the perfect value by a random amount. The amount of the variance is such that the value for any given element will not "encroach" on the range of values for either neighboring element. The third function, "randomFill," fills the vector with random numbers in the range zero to maxValue, then sorts the vector.

The maximum value is currently set to one million. The size of the vector is set to 70 elements strictly for plotting purposes. After the vector is filled the numerical values are displayed, then plotted. The plot is 70 units wide. The range of values is scaled to fit in 25 lines.

I can imagine circumstance where something like measurement samples would result in a more or less predictable data set. I can't imagine that one would search the set for a precise value. More likely, one would search for a value that fell within some tolerance range. Consequently, I have rewritten my first shot to allow a tolerance to be specified for the "needle."

After the set values are displayed and plotted the user is asked for a "needle" value and a tolerance. An interpolative search is made, followed by a binary search. The results are displayed, including the number of tries required to either find the value (within the tolerance spec) or determine that it is not within the set.

No time was spent in minimizing the code or in organizing it into separate source/header files. I have played with it a small amount. It behaves as I expect. It might be riddled with gotchas and glitches. They would have surfaced by now, but it's waiting for my boss to come look over my shoulder before it crashes. I have been concealing from my system the fact that I no longer have a boss, so it doesn't know not to wait for an opportune time.
#include <iostream>
#include <vector>
#include <algorithm>
#include <iomanip>
#include <string>
#include <cmath>
#include <ctime>

using namespace std;

typedef vector <int> dataSet;
const unsigned setSize = 70;
const unsigned maxValue = 1000000;

int  compare (int a, int b, int tol);
void showData (dataSet& set, unsigned size);
void plotData (dataSet& set, unsigned size);
void showResults (int idx, unsigned trials);
void linearFill (dataSet& set, unsigned max, unsigned size);
void perturbedFill (dataSet& set, unsigned max, unsigned size);
void randomFill (dataSet& set, unsigned max, unsigned size);
unsigned interpSearch (dataSet& theSet, unsigned low, unsigned high, int theVal, int tol, unsigned *tries);
unsigned binarySearch (dataSet& theSet, unsigned low, unsigned high, int theVal, int tol, unsigned *tries);

int uhOh (string badNews)
{
    cerr << badNews << endl;
    return 1;
}
int main (int argc, char *argv [])
{
    dataSet theSet;
    unsigned tries;
    int theVal, index, tolerance;
    unsigned maxRand = (unsigned) sqrt ((double) maxValue);
    srand ((unsigned) time (0));
    while (true)
    {
        // Generate dataset
        randomFill (theSet, maxValue, setSize);
        showData (theSet, setSize);
        plotData (theSet, setSize);

        cout << "Enter a positive integer in the range 0-" << (maxValue-1) << ": ";
        cin.sync ();
        cin >> theVal;
        if ((theVal < 0) || (theVal >= maxValue) || !cin.good ())
            return uhOh ("Aborting, pay attention next time");
        cout << "Enter a tolerance value in the range 0-" << (maxValue/setSize) << ": ";
        cin.sync ();
        cin >> tolerance;
        if ((tolerance < 0) || (tolerance > (maxValue/setSize)) || !cin.good ())
            return uhOh ("Aborting, invalid input");
        tries = 0;
        index = interpSearch (theSet, 0, setSize-1, theVal, tolerance, &tries);
        cout << "Interpolative search: ";
        showResults (index, tries);
        tries = 0;
        index = binarySearch (theSet, 0, setSize-1, theVal, tolerance, &tries);
        cout << "Binary search: ";
        showResults (index, tries);
        cout << "Press ENTER for next iteration\n";
        cin.sync ();
        cin.get ();
    }
	return 0;
}
unsigned interpSearch (dataSet& theSet, unsigned low, unsigned high, 
                       int theVal, int tol, unsigned *pTries)
{
    unsigned divs, theSlot;

    (*pTries)++;
    if ((theVal < (theSet [low] - tol)) || (theVal > (theSet [high] + tol))) return -1;

    if (low >= high)
    {
        if (compare (theSet [low], theVal, tol) == 0) return low;
        else return -1;
    }
    divs = high-low;
    theSlot = (theVal - theSet [low]) * divs / (theSet [high] - theSet [low]) + low;

    if (compare (theSet [theSlot], theVal, tol) == 0) return theSlot;
    else if (compare (theSet [theSlot], theVal, tol) > 0) 
            return interpSearch (theSet, low, theSlot-1, theVal, tol, pTries);
    else    return interpSearch (theSet, theSlot+1, high, theVal, tol, pTries);
}
unsigned binarySearch (dataSet& theSet, unsigned low, unsigned high, 
                       int theVal, int tol, unsigned *pTries)
{
    unsigned theSlot;
    (*pTries)++;
    if ((compare (theVal, theSet [low],  tol) < 0) || 
        (compare (theVal, theSet [high], tol) > 0)) return -1;

    if (low >= high)
    {
        if (compare (theSet [low], theVal, tol) == 0) return low;
        else return -1;
    }
    theSlot = (high-low)/2 + low;
    if (compare (theSet [theSlot], theVal, tol) == 0) return theSlot;
    else if (compare (theSet [theSlot], theVal, tol) > 0) 
            return binarySearch (theSet, low, theSlot-1, theVal, tol, pTries);
    else    return binarySearch (theSet, theSlot+1, high, theVal, tol, pTries);
}
void showData (dataSet& set, unsigned size)
{    
    cout << "The set:\n";
    unsigned nLines = size / 10;
    if (size % 10) nLines++;
    for (unsigned i = 0; i < nLines; ++i)
    {
        for (unsigned j = 0; j < 10; ++j)
        {
            cout << setw (6) << set [i*10+j] << ' ';
        }
        cout << endl;
    }
    cout << endl;
}
void plotData (dataSet& set, unsigned size)
{
    int min = set [0];
    int max = set [size-1];
    int incr = max / 25;
    for (int i = 24; i >= 0; --i)
    {
        cout << '|';
        for (int j = 0; j < 70; ++j)
        {
            if ((set [j] < i*incr) && (set [j] > i*incr-incr)) cout << 'x'; else cout << (char) ' ';
        }
        cout << endl;
    }
    cout << "|----------------------------------------------------------------------";
    cout << endl;
}
void showResults (int idx, unsigned trials)
{
    string sTries = " tries";
    if (trials == 1) sTries = " try";
    if (idx >= 0 ) cout << "Found in element " << idx
                          << " with " << trials;
    else cout << "Determined not to be in set after " << trials;
    cout << sTries << endl;
}
void linearFill (dataSet& set, unsigned max, unsigned size)
{
    unsigned incr = max / (size-1);
    for (unsigned i = 0; i < size; ++i) set.push_back (i * incr);
}
void perturbedFill (dataSet& set, unsigned max, unsigned size)
{
    unsigned incr = max / (size-1);
    for (unsigned i = 0; i < size; ++i) set.push_back (i * incr + (rand () % incr));
}
void randomFill (dataSet& set, unsigned max, unsigned size)
{
    for (unsigned i = 0; i < size; ++i) set.push_back ((rand () * rand ()) % max);
    sort (set.begin (), set.end ());
}
int  compare (int a, int b, int tol)
{
    if ((a - b + tol) < 0) return -1;
    if ((a - b - tol) > 0) return 1;
    return 0;
}
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote