The regex()
works both as a filter and can
extract new fields using a regular expression. The regular
expression can contain one or more named capturing groups.
Fields with the names of the groups will be added to the events.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
field | string | optional[a] | @rawstring | Specifies the field to run the regular expression against. |
flags | string | optional[a] | m | Specifies regex modifier flags. |
Values | ||||
d | Period (.) also includes newline characters | |||
i | Ignore case for matched values | |||
m | Multi-line parsing of regular expressions | |||
limit | integer | optional[a] | 100 | Defines the maximum number of events to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly. |
regex [b] | string | required | Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events. | |
repeat | boolean | optional[a] | false | If set to true, multiple matches yields multiple events. |
Values | ||||
false | Match at most one event | |||
true | Match multiple events | |||
strict | boolean | optional[a] | true | Specifies if events not matching the regular expression should be filtered out of the result set. |
Values | ||||
false | Events not matching the regular expression are not filtered out then the regex matches. | |||
true | Events not matching the regular expression are filtered out of the result set. | |||
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
regex
can be omitted; the following forms of this function are equivalent:logscale Syntaxregex("value")
and:
logscale Syntaxregex(regex="value")
These examples show basic structure only.
Hide negatable operation for this function
Negatable Function OperationThis function is negatable, implying the inverse of the result. For example:
logscale Syntax!regex()
Or:
logscale Syntaxnot regex()
For more information, see Negating the Result of Filter Functions.
Regular expressions in LogScale allow you search (filter) and extract information and are a very common part of the LogScale language and syntax.
LogScale uses JitRex which closely follows — but does not entirely replicate — the syntax of RE2J regular expressions, which is very close to Java's regular expressions. See Regular Expression Syntax for more information.
Note
To ensure compatibility, it is recommended to always test your regular expressions inside LogScale, instead of a 3rd party regex tool.
Escaping Characters
Care needs to be taken when escaping characters in the regular
expression submitted to the regex()
function. The functions uses the
\
backslash character to
indicate when an individual character needs to be escaped,
which is used in many common situations to indicate the
original character. This works for all characters except the
backslash itself. Within regex()
you must
double-escape the backslash; this is because it needs to be
escaped for definition within the string, and then again when
the regular expressed is parsed.
This can cause complexities when looking for filenames that
use the backslash (for example, Windows filename
\Windows\tmp\myfile.txt
).
The following regular expression will not work as expected:
regex("\\(?<file_name>[^\\]+$)")
The regular expression is trying to identify all the text
between the \
character.
However, because we are submitting a string to the
regex()
, the regular exprssion will be
expanded to:
\(?<file_name>[^\]+$)
Because the backslash is only escaped once the expression will fail. Instead, escape the backslash twice:
regex("\\\\(?<file_name>[^\\\\]+$)")
Two alternatives exist to avoid this:
Use the ASCII character code (
\x5c
) to specify the backslash:logscaleregex("\x5c\x5c(?<file_name>[^\x5c\x5c]+$)")
Use the
/regex
which is only parsed once and so only needs to be escaped once:logscale Syntax/\\(?<file_name>[^\\]+$)
The operation of regex()
and
/regex/
are summarized in the table below:
Operation |
regex()
|
/regex/
|
---|---|---|
Default search | @rawstring | All defined or parsed fields and @rawstring (not tags, @id or timestamp fields) |
Specific field search |
Using field parameter
|
Using field = /regex/
|
Note that:
foo = /regex/
andregex("regex", field=foo)
are equivalent; the latter has the benefit that more parameters can be used to refine the search. Specifically, it allows for specifyingstrict=false
. The former has the benefit that the regular expression is not written as a string and therefore there are elements that don't need escaping./regex/
specifies free-text search which searches all fields. Wehn used in a query it searches exactly the fields as they were in the original event, and it works only before the first aggregator.
The difference in search scope between the two regex syntax
operations introduces a significant performance difference
between the two. Using regex()
searches
only the specified field (@rawstring by
default) and can be significantly more performant than the
/regex/
syntax depending on the number of
fields in the dataset.
Using g
in flags
When performing queries, the
g
option — used for
global, as in repeating — is allowed in a query, but
is not an acceptable option for the
flags
parameter.
To use one of the parameters for multiple matches, you
should instead set the
repeat
parameter
to true
.
For more information, see Global (Repeating) Matches.
regex()
Syntax Examples
Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.
regex("https?://(www.)?(?<refdomain>.+?)(/
| $)", field=referrer)
| groupBy(refdomain, function=count())
| sort(field=_count, type=number, reverse=true)
Extract the user id from the url field. New fields are stored in a field named userid.
regex(regex="/user/(?<userid
>\\S+)/pay", field=url)
Show how to escape "
in the
regular expression. This is necessary because the regular
expression is itself in quotes. Extract the user and message
from events like: Peter:
"hello"
and Bob: "good
morning"
.
regex("(?<name>\\S+): \"(?<msg>\\S+)\"")
Note
There are no default flags for a regular expression. For example:
@rawstring=/expression/
Is syntactically equivalent to:
regex("expression")
Or:
regex("expression", flags="")
When using flags:
@rawstring=/expression/m
Is syntactically equivalent to:
regex("expression", flags="m")
regex()
Examples
Click
next to an example below to get the full details.Extract the Top Most Viewed Pages of a Website
Query
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
| top(url_page, limit=12, rest=others)
Introduction
Your LogScale repository is ingesting log entries from a web
server for a photography site. On this site there are several articles
about photography. The URL for articles on this site ends with the
extension, .page
instead of
.html
.
You want to extract the page users viewed and then list the top most viewed pages.
Step-by-Step
Starting with the source repository events.
- logscale
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
Extracts the page viewed by users by returning the name of the file from the url field and storing that result in a field labeled, url_page.
- logscale
| top(url_page, limit=12, rest=others)
Lists the top most viewed pages. The first parameter given is that url_page field coming from the first line of the query. The second parameter is to limit the results to the top twelve — instead of the default limit of ten. Because we're curious of how many pages were viewed during the selected period that were not listed in the top twelve, the rest parameter is specified with the label to use.
Event Result set.
Summary and Results
The table displays the matches from the most viewed pages during the selected period to the least — limited to the top twelve.
url_page | _count |
---|---|
home.page | 51 |
index.page | 21 |
home-studio.page | 10 |
a-better-digital-camera.page | 7 |
is-film-better.page | 6 |
leica-q-customized.page | 6 |
student-kit.page | 4 |
focusing-screens.page | 4 |
changing-images-identity.page | 2 |
others | 27 |
Filter Out Based on a Non-Matching Regular Expression (Function Format)
Query
responsesize > 2000
| not regex("/falcon-logscale-.*/",field=url)
Introduction
This example searches weblog data looking for large log entries that are larger than a specified size but not in a specific directory.
Step-by-Step
Starting with the source repository events.
- logscale
responsesize > 2000
Fine
- logscale
| not regex("/falcon-logscale-.*/",field=url)
Negates the regular expression match, here filtering out any filename that contains the prefix
falcon-logscale
, but returning all other matching URLs. Event Result set.
Summary and Results
For example, given the following events:
@timestamp | #repo | #type | @id | @ingesttimestamp | @rawstring | @timestamp.nanos | @timezone | client | httpversion | method | responsesize | statuscode | url | userid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6401_1719982743 | 2024-07-03T04:59:41 | 192.168.1.240 - - [03/07/2024:04:59:03 +0000] "GET /js/htmllinkhelp.js HTTP/1.1" 200 23 | 0 | Z | 192.168.1.240 | HTTP/1.1 | GET | 23 | 200 | /js/htmllinkhelp.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6400_1719982743 | 2024-07-03T04:59:41 | 192.168.1.24 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis-1.100/css-images/external-link.svg HTTP/1.1" 200 1072 | 0 | Z | 192.168.1.24 | HTTP/1.1 | GET | 1072 | 200 | /data-analysis-1.100/css-images/external-link.svg | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6399_1719982743 | 2024-07-03T04:59:41 | 192.168.1.209 - - [03/07/2024:04:59:03 +0000] "GET /js/htmllinkhelp.js HTTP/1.1" 304 - | 0 | Z | 192.168.1.209 | HTTP/1.1 | GET | - | 304 | /js/htmllinkhelp.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6398_1719982743 | 2024-07-03T04:59:41 | 192.168.1.39 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis/js/java.min.js HTTP/1.1" 304 - | 0 | Z | 192.168.1.39 | HTTP/1.1 | GET | - | 304 | /data-analysis/js/java.min.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6397_1719982743 | 2024-07-03T04:59:41 | 192.168.1.62 - - [03/07/2024:04:59:03 +0000] "GET /falcon-logscale-cloud/js/php.min.js HTTP/1.1" 200 6397 | 0 | Z | 192.168.1.62 | HTTP/1.1 | GET | 6397 | 200 | /falcon-logscale-cloud/js/php.min.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6396_1719982743 | 2024-07-03T04:59:41 | 192.168.1.206 - - [03/07/2024:04:59:03 +0000] "GET /integrations/js/theme.js HTTP/1.1" 200 14845 | 0 | Z | 192.168.1.206 | HTTP/1.1 | GET | 14845 | 200 | /integrations/js/theme.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6395_1719982743 | 2024-07-03T04:59:41 | 192.168.1.1 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis/js/json.min.js HTTP/1.1" 200 496 | 0 | Z | 192.168.1.1 | HTTP/1.1 | GET | 496 | 200 | /data-analysis/js/json.min.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_0_6394_1719982743 | 2024-07-03T04:59:41 | 192.168.1.252 - - [03/07/2024:04:59:03 +0000] "GET /falcon-logscale-cloud/js/java.min.js HTTP/1.1" 200 2739 | 0 | Z | 192.168.1.252 | HTTP/1.1 | GET | 2739 | 200 | /falcon-logscale-cloud/js/java.min.js | - |
Might return the following values:
@timestamp | #repo | #type | @id | @ingesttimestamp | @rawstring | @timestamp.nanos | @timezone | client | httpversion | method | responsesize | statuscode | url | userid |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_2_6541_1719982743 | 2024-07-03T05:03:48 | 192.168.1.231 - - [03/07/2024:04:59:03 +0000] "GET /logscale-repo-schema/js/corp.js HTTP/1.1" 200 18645 | 0 | Z | 192.168.1.231 | HTTP/1.1 | GET | 18645 | 200 | /logscale-repo-schema/js/corp.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_2_6538_1719982743 | 2024-07-03T05:03:48 | 192.168.1.69 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis-1.100/images/dashboards.png HTTP/1.1" 200 152590 | 0 | Z | 192.168.1.69 | HTTP/1.1 | GET | 152590 | 200 | /data-analysis-1.100/images/dashboards.png | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_2_6535_1719982743 | 2024-07-03T05:03:47 | 192.168.1.154 - - [03/07/2024:04:59:03 +0000] "GET /integrations/js/theme.js HTTP/1.1" 200 14845 | 0 | Z | 192.168.1.154 | HTTP/1.1 | GET | 14845 | 200 | /integrations/js/theme.js | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_2_6534_1719982743 | 2024-07-03T05:03:47 | 192.168.1.58 - - [03/07/2024:04:59:03 +0000] "GET /integrations/images/extrahop.png HTTP/1.1" 200 10261 | 0 | Z | 192.168.1.58 | HTTP/1.1 | GET | 10261 | 200 | /integrations/images/extrahop.png | - |
2024-07-03T04:59:03 | weblogs | httpsimp | MqHKxw2QoBPZyNqbJRRs4ECC_2_6527_1719982743 | 2024-07-03T05:03:47 | 192.168.1.164 - - [03/07/2024:04:59:03 +0000] "GET /integrations/images/zeek.png HTTP/1.1" 200 4392 | 0 | Z | 192.168.1.164 | HTTP/1.1 | GET | 4392 | 200 | /integrations/images/zeek.png | - |
Filter Out Based on a Non-Matching Regular Expression (Syntax)
Query
method != /(PUT
| POST)/
Introduction
This example searches weblog data looking for events where the method does not match a specified value.
Step-by-Step
Starting with the source repository events.
- logscale
method != /(PUT | POST)/
This line performs a negative regular expression match, returning only the events where the method does not match either
PUT
orPOST
. Event Result set.
Summary and Results
This format of the query can be a simple way to perform a negative regular expression match, or more specifically, returning a list of the events that do not match the given regular expression.
Get Integer Part of Number
Get the integer part of a number using the
regex()
function and regex capturing groups
Query
regex("(?<b>\\d+)\\..*",field=a)
Introduction
In this example, regex pattern matching with a named capturing group is used to look at a filename and find something after the backslash, then store it in a new field named b, leaving the original field a unchanged.
See also alternative method mentioned under the summary.
Step-by-Step
Starting with the source repository events.
- logscale
regex("(?<b>\\d+)\\..*",field=a)
Looks for a sequence of characters in a capturing group and replaces the character with a digit (number): \\ backslash (\) \d+ one or more digits \\ backslash (\) . any character .* zero or more characters. If the sequence of characters in an event looks like this
\folder58\
instead of\folder58\a
, then there is no filename as nothing comes after the\
. Event Result set.
Summary and Results
The query with regex pattern matching and named capturing group is used to get the integer part of a number, storing the replacement (the matched value) automatically in a new field named b. This is useful when searching for specific filenames.
The query using the regex()
function is primarily
used for pattern matching and extraction as regex is generally very
concise for simple extraction tasks.
There is another way of achieving the same end result using the
replace()
function in a query like this:
replace("(\\d+)\\..*", with="$1", field=a,
as=b)
. This query uses the replace function with numbered
references to perform substitution, whereas the first one uses regex
pattern matching with a named capture group.
The query using replace()
captures digits before
the decimal point in an unnamed group, and explicitly creates a new
field b with the result (\\d+).
This query using the replace()
function is more
used for string manipulation and transformation in a replacement
operation.
Replace Word or Substring With Another
Replace a word or substring with another in an event set using the
replace()
function with a regular expression
Query
replace(regex=propperties, with=properties)
Introduction
In this example, the replace()
function is
used to correct a spelling mistake.
Step-by-Step
Starting with the source repository events.
- logscale
replace(regex=propperties, with=properties)
Replaces the word
propperties
with the wordproperties
. Event Result set.
Summary and Results
The query is used to correct spelling mistakes in an event set. Changing words or other substrings like this with a regular expression is useful in many situations, where it is necessary to make quick changes of field values.
Search for Command Line String
Search for command line string after /
and
before @
using a regular expression
Query
#event_simpleName=ProcessRollup2
| CommandLine=/@/
| CommandLine=/\/.*@/
Introduction
A regular expression can be used to run a query that looks for command
line strings containing any characters after /
and
before @
. It is important to perform as much
filtering as possible to not exceed resource limits.
In this example, a regular expression is used to filter and search for
specific process events in the CrowdStrike Falcon platform. Note that
the query filters on the @
alone first to perform as
much filtering as possible.
Step-by-Step
Starting with the source repository events.
- logscale
#event_simpleName=ProcessRollup2
Filters for events of the type
ProcessRollup2
in the #event_simpleName field. - logscale
| CommandLine=/@/
Filters for any command line containing the
@
symbol. - logscale
| CommandLine=/\/.*@/
Uses a regular expression to search the returned results for command lines that contain a forward slash (
/
) followed by any number of characters, and then a@
symbol. Event Result set.
Summary and Results
The query is used to search for command line strings that contain any
characters after /
and before @
.
The query could, for example, be used to help security analysts identify
potentially suspicious processes that might be interacting with email
addresses or using email-like syntax in their command lines.
Truncate a String or Message
Truncate a string or message to exaxt 100 characters using
replace()
function and regex capturing groups
Query
replace("^(.{100}).*", with="$1", field=message, as="truncated_message")
Introduction
In this example, the replace()
function together
with regex capturing group, is used to truncate a string, chop of last
part of a message, to only show the first 100
characters, replace the last character with a digit (number) at the end
of the line. and then store the truncated string in the new field
truncated_message, leaving the
field message untouched.
Step-by-Step
Starting with the source repository events.
- logscale
replace("^(.{100}).*", with="$1", field=message, as="truncated_message")
Captures group that matches exactly 100 characters of any type starting from the beginning of the line and replaces the last character at the end of the truncated string with a digit, then returns the truncated version in a new field named truncated_message. The original message field remains unchanged.
with="$1"
means that it replaces the entire match with the defined number of characters, in this case 100 characters. Event Result set.
Summary and Results
The query is used to truncate strings.
Truncation can, for example, be used to speed up download times and complete searches faster. In file systems, the truncate operation is used to reduce the size of a file by removing data from the end. This can be helpful when you need to reclaim storage space or when dealing with log files that need to be periodically truncated.
Another advantage of truncation is, that it allows you to search for a word that could have multiple endings. This way it will broaden the results and look for variations of words.
Truncation of numbers is also useful to shorten digits past a certain point in the number.