Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jul 16th, 2008, 11:15 PM   #1
gattu
Newbie
 
Join Date: Jul 2008
Posts: 2
Rep Power: 0 gattu is on a distinguished road
Question Q: Search in intervals

Hi,

I have a code where I need to search in a set of non-overlapping intervals; lets say { (1-10), (20-25), (30-40) } is my set. I need to search for a particular item, say 22, and determine which interval it lies in, (20-25) for this example.

I need to do this fast, as I have a large number of lookups. I am using a splay tree (basically a binary search tree which brings last accessed node to the root, to make searching tree fast). I was wondering if there was some way I could go down to O(1) using hash tables. But I can't figure out how I could hash intervals.

Any ideas?

~gattu~
gattu is offline   Reply With Quote
Old Jul 17th, 2008, 1:00 AM   #2
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 751
Rep Power: 3 Jimbo is on a distinguished road
Re: Q: Search in intervals

Assuming your intervals are constant (e.g. it will always be {[1,10],[20,25],[30,40]} for a given dataset), you could create a hashing function which takes in a value and returns a hash code for the appropriate interval. This would get your hash lookup to O(1) for each <item, interval> mapping.

Regarding your choice of a splay tree: do you know that you'll be searching for certain data more often than other data? If not, then it's quite possible for a splay tree to be non-performant, since you always have to restructure the tree to bring the latest search to the top. This works out well if you search for a subset of the data often as those nodes will congretate towards the top, but if you are going over your data set fairly evenly then another tree structure might be just as good or better.
__________________
<insert disclaimer here>
<insert shameless plug for Visual Studio here>
Jimbo is offline   Reply With Quote
Old Jul 17th, 2008, 8:48 AM   #3
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Posts: 1,827
Rep Power: 5 Sane will become famous soon enough
Re: Q: Search in intervals

A trivial way to get a query down to O(1) would be to precompute all of the values. You wouldn't need a hash function. A simple 1D array would suffice.

lookup = []

for i in num_intervals:
    for j in intervals[i][0] ... intervals[i][1]:
        lookup[j] = (intervals[i][0], intervals[i][1])

for i in num_queries:
    if lookup[queries[i]]:
        print lookup[queries[i]][0], lookup[queries[i]][1]
    else:
        print "Not In Any Interval"

This would be an <O(N), O(1)> algorithm (<Precomputing, Querying>), where N is the end point of the largest interval.

But if N could be as large as a couple billion, I doubt you can get an O(1) query without doing any such massive precomputations.

----------

Here is another really easy solution which does not require a splay tree. Just another 1D array.

An O(logN) query is usually fast enough for any situation (unless there are 10,000,000 queries). Assuming logn is okay, why not just binary search the interval.

This assumes your intervals are already sorted in increasing order.

for i in num_queries:

    low = 0
    high = num_intervals-1

    if queries[i] < intervals[0][0] or 
       queries[i] > intervals[num_intervals-1][1]:
        print "Not In Any Interval"
        continue

    while low < high:

        mid = (low+high)/2

        if intervals[mid][0] < queries[i] and 
           intervals[mid][1] < queries[i]: 
            low = mid + 1

        else if intervals[mid][0] > queries[i] and 
                intervals[mid][1] > queries[i]:
            high = mid - 1

        else:
            low = high

    print intervals[low][0], intervals[low][1]

This would be an <O(N), O(logN)> algorithm (<Precomputing, Querying>), where N is the number of intervals.

----------

Another very fast algorithm would be a line sweep.

After all of the intervals are sorted in increasing order, sort the queries in increasing order as well.

Do a horizontal sweep of the queries, keeping the sweep aligned with the intervals. The intervals are found for you. Then resort, output, and you're done.

Assuming queries are already sorted in increasing order.

j = 0

for i in num_queries:

    while j<num_intervals and intervals[j][0] <= queries[i]:
        j++

    if queries[i] < intervals[0][0] or 
       queries[i] > intervals[num_intervals-1][1]:
        print "Not In Any Interval"

    else:
        print intervals[j-1][0], intervals[j-1][1]


If the overall query time is QlogQ+N, then the average time per query is logQ+N/Q. Therefore, this would be an <O(1), O(logQ+N/Q)> algorithm (<Precomputing, Querying>), where N is the number of intervals and Q is the number of queries.

If the queries are already sorted, then this would be <O(1), O(1)>, which is by far the fastest possible! And again, it is very easy to code (much easier than a splay tree).

----------

Lastly, if you wanted your data structure to support more functions related to intervals, a range tree should be much easier to code than a splay tree.

Here is a good link related to what you're doing.

http://www.topcoder.com/tc?module=St...CommonAncestor

Last edited by Sane; Jul 17th, 2008 at 9:17 AM.
Sane is offline   Reply With Quote
Old Jul 22nd, 2008, 3:29 PM   #4
gattu
Newbie
 
Join Date: Jul 2008
Posts: 2
Rep Power: 0 gattu is on a distinguished road
Re: Q: Search in intervals

Thank you for your replies.

Unfortunately, the data structure is not static. I need to do around a billion lookups, with around a thousand insertions/deletions in between. Also there is huge temporal locality in my access patterns.

Any tree structures that take advantage of the cache that come to your mind, in case hashing is not an option?

~gattu~
gattu is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Preforming a binary(SQL) search on a datagrid. SydneyMcConnell Visual Basic .NET 4 Jan 10th, 2008 12:40 PM
Better Search Engine Design? bigguy Coder's Corner Lounge 4 Aug 13th, 2007 7:16 PM
How to save all search engine results urls in a text file abojan ASP 0 Dec 27th, 2006 8:33 AM
Backup Search grimpirate C 3 Jul 3rd, 2006 7:49 PM
An alternative to 'Windows Search'... SaturN Coder's Corner Lounge 9 Jul 24th, 2005 2:30 PM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 11:14 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC