Quick Links

Full-Text Search is a database technique which will retrieve records even if they don't exactly match your search criteria. This enables rich natural language searching that feels more familiar.

Searching for "database engine" using a full-text search will return results containing the phrases "database," "engine" or "database engine". This resolves possible ambiguities in your search criteria, so rows like "I have a database and an engine" still show up.

Simple searches in MySQL can be made using the

        LIKE
    

operator. This is inefficient and limited in functionality. MySQL usually needs to perform a full-table scan to identify records matching your query.

Full-text queries use a specially created index to improve performance. This also enables MySQL to keep track of the words within your dataset, facilitating the natural language search.

You can only use full-text search with columns that have a full-text index. Use the

        FULLTEXT
    

clause in

        CREATE TABLE
    

statements to set this up when you seed your database schema. You can also use it with

        ALTER TABLE
    

to add a search index to existing columns.

CREATE TABLE articles(content TEXT, FULLTEXT (content));
    

ALTER TABLE articles ADD FULLTEXT (content);

With the index in-place, you're ready to start querying your database.

Using Full-Text Searches

Full-text searches start with a

        WHERE
    

clause. You use

        MATCH AGAINST
    

instead of

        LIKE
    

. You need to indicate the indexed columns to match, as well as the query to search for.

SELECT * FROM articles WHERE MATCH (content) AGAINST ('database engine' IN NATURAL LANGUAGE MODE);

This will perform a full-text search of the articles using the query

        database engine
    

. By specifying

        IN NATURAL LANGUAGE MODE
    

, MySQL is instructed to interpret the query literally, without processing it in any way. This is the default search mode if none is specified.

Ordering By Result Relevance

One of the advantages of full-text search is that it lets you order the returned records by relevance. This is not possible with a regular

        LIKE
    

query. You can use a

        MATCH ... AGAINST
    

clause as part of a SQL

        SELECT
    

. The returned virtual column will contain a relevance score, from 0 to 1, indicating how closely the record matched the search query.

SELECT content, MATCH (content) AGAINST ('database engine') AS relevance FROM articles ORDER BY relevance DESC

This query would return the most relevant results first. This helps your application meet user expectations of how natural language search systems should perform.

MySQL computes search relevant scores by assessing several different factors. These include the number of records which match the query, as well as the number of times the query occurs within each record. A result with several exact matches for the query will rank higher than one which only contains part of the query.

When using

        MATCH ... AGAINST
    

in a

        SELECT
    

statement, you don't need to repeat it in the

        WHERE
    

clause. You could manually filter the results to include only records with a non-zero relevance score.

SELECT content, MATCH (content) AGAINST ('database engine') AS relevance FROM articles WHERE relevance > 0 ORDER BY relevance DESC

Query Expansion Mode

Natural language isn't the only supported full-text search mode. Query expansion mode is an alternative which helps widen the range of the search results. It automatically weights the search query against the most relevant terms.

An expanded search begins by finding the records that contain a match for the query. Those records are then checked to identify the most relevant words. The database then runs another search, this time basing it on the relevant words instead of the original query. This usually results in more records being returned while maintaining an acceptable degree of relevance.

Here's how you enable query expansion mode:

SELECT * FROM articles WHERE MATCH (content) AGAINST ('database engine' WITH QUERY EXPANSION)

Boolean Mode

The final full-text search mode is boolean mode. This lets you include boolean modifiers in your query. You can use this mode when you need advanced control over the matching logic.

You can require a word to be present in each result by prefixing it with

        +
    

. Use

        -
    

to exclude results which contain the word. Other operators can be used to match parts of words, create sub-expressions and make words reduce the relevance score. The latter can be helpful when masking "noise" terms. If you don't specify an operator, it's implied the word will be referenced in the query with

        OR
    

.

SELECT * FROM articles WHERE MATCH (content) AGAINST ('+data* engine -sqlite' IN BOOLEAN MODE)

This query would surface articles with at least one word starting with

        data
    

where

        sqlite
    

is not mentioned. Each result may or may not include the word

        engine
    

.

Boolean mode lets you construct powerful searches using your own logic. One caveat is it doesn't support search relevance scoring. This is the tradeoff in giving users the ability to make boolean searches. The result ranking may not align with human expectations.

MySQL comes with several full-text configuration options which let you finetune how searches are conducted. Here's a few of the most important.

  •         innodb_ft_min_token_size
        
    - Sets the minimum word length for indexed terms. Words with fewer characters than this value won't be added to the index so you won't be able to search them. The default value is
            3
        
    , which excludes extremely common words such as
            a
        
    ,
            an
        
    and
            I
        
    . This setting applies to InnoDB tables; use
            ft_min_word_len
        
    for MyISAM.
  •         innodb_ft_max_token_size
        
    - Similarly to
            innodb_ft_min_token_size
        
    , this sets the maximum length of indexed words. Longer words won't be searchable. Use
            ft_max_word_len
        
    for MyISAM tables.
  •         innodb_ft_enable_stopword
        
    - This setting, on by default, lets you control whether MySQL should filter out "stopwords". Stopwords are very commonly used words which could unduly influence search results. The default stopword list contains 36 frequent phrases.
  •         innodb_ft_user_stopword_table
        
    - You can set this value to the name of a database table which MySQL should source the stopword list from. This table must have a single
            VARCHAR
        
    column called
            value
        
    . Add your stopwords to exclude as records in this table.

These settings are usually set in your MySQL server configuration file. The location varies by platform; it can often be found at

        /etc/mysql/my.cnf
    

. You'll need to restart the MySQL service after changing the settings.

Once the server is back up, rebuild the full-text indexes for your table. You must do this so your data is reindexed using the current configuration. Otherwise, the previously indexed data will continue to be used.

To reindex an InnoDB table, run

        OPTIMIZE TABLE my_table
    

. For MyISAM tables, use

        REPAIR TABLE my_table QUICK
    

. The search indexes will then be rebuilt so your configuration changes take effect.

Summary

MySQL Full-Text Search is enabled by adding a

        FULLTEXT
    

index to your searchable fields. You then use

        MATCH ... AGAINST
    

with one of three searching modes to get your results. Natural language queries return a search relevance score which you can use to rank your results.

Full-text search offers more powerful search behaviour than a

        LIKE
    

statement. It's also much more performant, particularly on large datasets, as all the text is indexed in advance.