The Oracle REGEXP_SUBSTR()
function is an advanced version of the SUBSTR()
function that allows you to search for substrings based on a regular expression. Instead of returning the position of the substring, it returns a portion of the source string that matches the regular expression.
Syntax
The following illustrates the syntax of the Oracle REGEXP_SUBSTR()
function:
REGEXP_SUBSTR(source_string, pattern
[, start_position
[, occurrence
[, match_parameter
[, subexpr
]
]
]
]
)
Code language: SQL (Structured Query Language) (sql)
Arguments
The Oracle REGEXP_SUBSTR()
function accepts 6 arguments:
1) source_string
is a string to be searched for.
2) pattern
is the regular expression pattern that is used to search for in the source string.
3) start_position
is positive integer that indicates the starting position in the source string where the search begins.
The start_position
argument is optional. Its default value is 1. Therefore, if you don’t specify it explicitly, the REGEXP_SUBSTR()
function will start searching at the beginning of the source string.
4) occurrence
is a positive integer that specifies which occurrence of the search pattern that the REGEXP_SUBSTR()
function should search for.
The occurrence
argument is also optional and it defaults to 1, meaning that the REGEXP_SUBSTR()
function should search for the first occurrence of the pattern in the source string.
5) match_parameter
is a literal string that determines the default matching behavior for the REGEXP_SUBSTR()
function.
You can use one or more following values for the match_parameter
argument:
- ‘i’ indicates case-insensitive matching
- ‘c’ indicates case-sensitive matching.
- ‘n’ allows the period (.) character to match the newline character. If you don’t explicitly specify this parameter, the
REGEXP_SUBSTR()
function will not use the period to match the newline character. - ‘m’ treats the source string as a multiline string.
Because the match_parameter
argument is optional, therefore, if you omit it, the REGEXP_SUBSTR()
function will behave as follows:
- Case sensitivity matching is determined by
NLS_SORT
parameter. - The period (.) does not match the newline character.
- The source string is treated as a single line.
6) subexpr
is a positive integer whose value from 0 to 9 that indicates which sub-expression in the regular expression is the target.
Return Value
The REGEXP_SUBSTR()
function returns a portion of source string that matches the regular expression.
The data type of the result string can be either VARCHAR2 or CLOB and its character set is the same as the source string.
Examples
Suppose, we have the following string:
'This is a regexp_substr demo'
Code language: SQL (Structured Query Language) (sql)
If you want to get the fourth word of the above string, you use the REGEXP_SUBSTR()
function as follows:
SELECT
regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1, 4
) the_4th_word
FROM
dual;
Code language: SQL (Structured Query Language) (sql)
The result is
regexp_substr
Code language: SQL (Structured Query Language) (sql)
In this example:
'[[:alpha:]]+'
is the regular expression pattern that matches any word.- 1 instructs the function to start searching for the beginning of the string.
- 4 specifies the fourth occurrence.
See the following page more information on regular expression support on Oracle.
To return every word of the source string, you can use the CONNECT BY LEVEL
clause where the LEVEL
keyword is used as the fourth argument as follows:
SELECT
regexp_substr( 'This is a regexp_substr demo', '[[:alpha:]]+', 1, LEVEL ) regexp_substr
FROM
dual
CONNECT BY LEVEL <= regexp_count( 'This is a regexp_substr demo', ' ' ) + 1;
Code language: SQL (Structured Query Language) (sql)
Note that the total of match occurrences is calculated by counting the number of spaces in the source string plus 1 by using the REGEXP_COUNT()
function.
See the following products
table in the sample database:
The following statement selects all MotherBoards from the products
table:
SELECT
product_id,
product_name,
description
FROM
products
WHERE
category_id = 4
ORDER BY
product_name ;
Code language: SQL (Structured Query Language) (sql)
Suppose, you want to get the maximum RAM that a motherboard can support, you can extract this information from the description column using the following regular expression:
'\d+(GB|TB)'
Code language: SQL (Structured Query Language) (sql)
This regular expression means to match one or more numbers (\d+) followed by either GB or TB string (GB|TB).
The following query illustrates the idea:
SELECT
product_id,
product_name,
description,
REGEXP_SUBSTR( description, '\d+(GB|TB)' ) max_ram
FROM
products
WHERE
category_id = 4;
Code language: SQL (Structured Query Language) (sql)
Here is the output:
In this tutorial, you have learned how to use the Oracle REGEXP_SUBSTR()
to extract a portion of string based on a regular expression.