Fast MySql REGEXP LIKE Search Solution

Recently had the job of improving the speed of a clients PHP script MySql query which used REGEXP (RLIKE)

The table was large >50k rows and negative matches needed to be used, the original query was not written using FullText Match Against Boolean due to the over 50% threshold limit and the Stopwords would cause poor results.

As we know LIKE with wildcards are pretty fast but no use for exact word searches, example %build% would find builder and building as well as build, so the query was written as REGEXP [[:<:]]build[[:>:]] which matches End Of Words.

Most queries took >6 seconds which was simply pathetic, so how do we speed this up?

Simply use LIKE AND REGEXP, this way we only perform the REGEXP on the LIKE’ results

SELECT * FROM table WHERE field REGEXP '[[:<:]]build[[:>:]]'
AND field NOT REGEXP '[[:<:]]material[[:>:]]'
6.210 Seconds

SELECT * FROM table WHERE field LIKE '%build%' AND field NOT LIKE '%material%'

0.244 Seconds

And Now

SELECT * FROM table WHERE field LIKE '%build%' AND field NOT LIKE '%material%'
AND field REGEXP '[[:<:]]build[[:>:]]' AND field NOT
REGEXP '[[:<:]]material[[:>:]]'

0.285 Seconds

REGEXP results at LIKE Speeds.

Let me know if this works for you.

Faster REGEXP Search Faster MySql REGEXP LIKE Search
Movie Maker MP4 Movie Maker Import MP4 3GP
Movie Maker Profiles A selection of Movie Maker Profiles
Simple Password Protect Page A lite simple PHP password protect script
Develop Our Wordpress website development service
Projects Information on a small selection of our projects
Tips Some tips and ideas we would like to share
Wordpress Plugins and Functions for Wordpress Developers

3 Comments

Paul Bennett on the 24th of November 2009

Hi there, It looks like the initial LIKE is doing the bulk of the filtering - could this be where the performance gains are coming from? From what I know, MySQL will optimize the query before running it and will get the first result set using the LIKE and then use the 2nd and 3rd REGEXP's to filter the result set further before returning it (but I could be wrong) Paul

mykel on the 29th of July 2009

Hi, Looks good, although I think in your specific example the AND field NOT LIKE '%material%' portion will exclude some results that you wouldn't necessarily want to exclude. e.g. if field was "build materialx" it would be excluded in your optimized but not in the original regex version.

Nurziya on the 21st of May 2009

how it works?! some kind of magic?))

Leave a comment




Name and email address required, website must be related
08445 008 009