String.IndexOf vs. Regexp.Match

While building a text editor, I needed to implement search — since no text-editor would be complete without search! But how would I implement it?

Before the default, knee-jerk reaction of “use string.IndexOf,” I thought to myself, “hey, wait a minute–regular expressions are supposed to be fast.” In fact, their basis is that they’re efficiently implemented.

What if I used a regular expression to search instead? I could simply put the search text in (properly escaped), and a match would reveal the location of the text.

I decided to run a test; the test comprised of searching through 10,000 words worth of text for a single word, which was located only once at the end of the entire text. I ran a for-loop, in which I either used IndexOf, or some regexp’s Match, to find the result.

And the result? Not surprising in the outcome, but surprising in the difference; for 5000 iterations of searching, IndexOf clocked in at 18 seconds. And Regexp.Match clocked in at one-quarter of a second.

Wait a sec. That means that Regexp.Match, in this case, is more than 70x times faster than IndexOf. Wow. That’s quite an order of magnitude difference.

Based on this research, I would therefore recommend avoiding IndexOf in favour of Regexp.Match for sizable texts (say, more than 1000 words or so). It might actually save you a ton of time. And of course, you need to properly quote your search string, to make sure the user doesn’t enter any regular-expression-specific special characters.

About Ashiq Alibhai, PMP

Ashiq has been coding C# since 2005. A desktop, web, and RIA application developer, he's touched ASP.NET MVC, ActiveRecord, Silverlight, NUnit, and all kinds of exciting .NET technologies. He started C# City in order to accelerate his .NET learning.
This entry was posted in Core .NET, Silverlight, Web, Wndows Forms and tagged , , . Bookmark the permalink.

2 Responses to String.IndexOf vs. Regexp.Match

  1. Mike Zach says:

    This is a false statement (in practice). If you loaded 10,000 words into a single string (i.e. load the contents of a file from a stream into a string or any other manner) – without an unneccesary loop – and you have no desire to match on a pattern, string.IndexOf() is immensely faster. In a real application, looping over every word (i.e. split the string into an array on a space) is… retarded. However, if you need pattern matching, RegEx is obviously the way to go. If time is critically important, Boyer-Moore or some other algorithm would need to be considered (likely in an unmanaged component i.e. C/C++) – however, you’re talking where mili-microseconds matter.

    Moral of the story, branch your logic depending on if you’re including pattern matching to string.IndexOf() or RegEx.IsMatch().
    i.e.:

    return ((pbMatchPattern) ? Regex.IsMatch(sContents, sSearchVal) : (sContents.IndexOf(sSearchVal) > -1));

    Of course, this is assuming that for whatever reason, you can’t load this data into SQL (or at least some type of database) in the first place.

  2. Johnny Boy says:

    I would like to know your testing environment because every article I’ve read or seen tests on always shows Regex being the slowest.

    For example:
    http://cc.davelozinski.com/c-sharp/c-fastest-way-to-check-last-character-of-a-string

    Regex is great for matching patterns and such don’t get me wrong. But when it comes to speed and/or micro-optimizing code, Regex should not be considered at all.

Leave a Reply

Your email address will not be published. Required fields are marked *