MySQL full-text search — part 2
Introduction
The previous post outlined what full-text search is and why it matters. Here we go deeper into syntax and usage of full-text search in MySQL.
We will use the following sample schema for the examples in this post and the next:
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title, body)
);
INSERT INTO articles (title, body)
VALUES
('MySQL Tutorial', 'DBMS stands for DataBase ...'),
('How To Use MySQL Well', 'After you went through a ...'),
('Optimizing MySQL', 'In this tutorial we will show ...'),
('1001 MySQL Tricks', '1. Never run mysqld as root. 2. ...'),
('MySQL vs. YourSQL', 'In the following database comparison ...'),
('MySQL Security', 'When configured properly, MySQL ...');
Natural language mode and Boolean mode
MATCH (col1, col2, ...) AGAINST (expr [search_modifier])
search_modifier:
{
IN NATURAL LANGUAGE MODE
| IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
| IN BOOLEAN MODE
| WITH QUERY EXPANSION
}
Natural language mode
In natural language mode, search follows concepts (rather than exact token matching like boolean mode) for the free-text query you supply. MATCH … AGAINST returns a relevance score (a floating-point value) based on how well each matched document fits the query; the better the fit, the higher the rank. (How “fit” is defined is covered later.)
To inspect the rank for a keyword, you can run:
SELECT id, ROUND(MATCH(title, body) AGAINST ('database'), 7) AS relevance
FROM articles;
Boolean mode
In boolean mode, search is driven by the terms you pass in, and rows are not ordered by relevance by default. Unlike natural language mode, you rely on explicit operators (+, -, ~, *, parentheses, phrase quotes) rather than corpus-wide term statistics alone, to require, exclude, or weight terms.
Example:
SELECT *
FROM articles
WHERE MATCH (title, body) AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
This returns only rows that contain MySQL and must not contain YourSQL.
You can combine boolean operators so MySQL ranks rows according to your intent. Examples:
-
'apple banana'— Rows that contain at least one of the two terms. -
'+apple +juice'— Rows that must contain both terms. -
'+apple macintosh'— Rows that must containapple;macintoshis optional (boosts rank if present). -
'+apple -macintosh'— Rows that must containappleand must not containmacintosh. -
'+apple ~macintosh'— Rows that must containapple;macintoshis optional, but rows withoutmacintoshrank higher. -
'+apple +(>turnover <strudel)'— Rows that containappleand eitherturnoverorstrudel, with “apple turnover” ranked above “apple strudel”. -
'apple*'— Prefix search: matchesapple,apples,applesauce,applet, and so on. -
'"some words"'— Phrase search: exact phrase some words.
Query expansion
With query expansion (WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION), MySQL runs two search passes. On the second pass it expands the original query with related prominent terms from the first hit set.
Example: searching for "database" with the sample data above:
SELECT * FROM articles WHERE MATCH (title, body) AGAINST ('database');
With query expansion, MySQL notices that among rows matching database, MySQL appears often as a co-occurring term, so it also returns rows that contain MySQL even if they do not contain database.
Per the MySQL manual, query expansion can increase noise and irrelevant hits, so it is best reserved for short queries.
When natural language mode returns few or no rows
Prefer boolean mode when the dataset is very small (few rows) or when test data repeats the same content. Natural language mode weights terms by document frequency within a row and across the table: the more common a term is globally, the lower its weight—so MySQL may return an empty set quite often on tiny or repetitive corpora.
Boolean mode and special characters in keywords
Boolean mode treats (, ), +, -, ~, >, *, " as operators. That clashes with keywords that contain those characters—for example SBD-1107, where - is a hyphen, not the boolean exclude operator.
One workaround is to add a predicate such as WHERE title LIKE 'SBD-1107%' (or the appropriate column). As in the previous post, a trailing wildcard still allows index use (no leading %), so performance stays acceptable.
The next post covers how MySQL ranks full-text matches and how the full-text index works.
References
Official MySQL 8.0 Reference Manual entries for topics in this post:
Overview and syntax
- Full-text search functions —
MATCH (…) AGAINST (…)grammar, search modifiers, supported engines and column types, and links to the sections below.
Search modes
-
Natural language full-text searches — free-text queries, relevance scores, default sort order, and
ROUND(MATCH … AGAINST …)style expressions. -
Boolean full-text searches — required/optional/excluded terms,
~,>,*, grouping, phrase searches, and default ordering (useORDER BY MATCH …when you need relevance order). -
Full-text searches with query expansion — two-pass expansion (
WITH QUERY EXPANSION/IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION).
Edge cases and hybrid filters
-
Full-text restrictions — limitations and parser behavior that interact with real-world queries.
-
The
LIKEoperator — string matching when boolean metacharacters clash with literal tokens (e.g. hyphens in codes); pair with Column indexes for how indexes apply toLIKEpredicates.