Splits a string using a regular expression into an array of values.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
as | string | optional[a] | _splitstring | Emit selected attribute using this name. |
by | string | required | String or regular expression to split by. | |
field [b] | string | optional[a] | @rawstring | Field that needs splitting. |
index | number | optional[a] | Emit only this index after splitting. Can be negative; -1 designates the last element. | |
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale SyntaxsplitString("value",by="value")
and:
logscale SyntaxsplitString(field="value",by="value")
These examples show basic structure only.
splitString()
Syntax Examples
Assuming an event has the @rawstring="2007-01-01 test bar" you can split the string into fields part[0], part[1], and part[2]:
...
| part := splitString(field=@rawstring, by=" ")
Assuming an event has @rawstring:
2007-01-01 test bar
You can split pick out the date part using:
...
| date := splitString(field=@rawstring, by=" ", index=0)
Assuming an event has @rawstring:
<2007-01-01>test;bar
You can split the string into attributes part[0], part[1], and part[2]. In this case, the splitting string is a regex specifying any one of the characters <, >, or ;
...
| part := splitString(field=@rawstring, by="[<>;]")
Split an event into multiple events by newlines. The first
function splitString()
creates
@rawstring[0],
@rawstring[1],
... for each line, and
the following split()
creates the
multiple events from the array of rawstrings.
...
| splitString(by="\n", as=@rawstring)
| split(@rawstring)
Split the value of a string field into individual characters:
characters := splitString(my_field, by="(?!\A)(?=.)")
Split the value of a string using case-insensitive regex:
characters := splitString(my_field, by="(?i)(e
| encoded
| enc)")
Split the string using a multi-character separator. This can
be used for system logs that use the multi-character separator
to allow a character such as comma or colon that might
otherwise be used as a separator. Because the value to the
by
is a regular
expression, you should use a regular expression group as the
value. For example:
splitString(by="(\*\
| \*)")
Splits incoming data by the string *|*
and
would correctly split the string
image.png*|*PNG*|*0755*|*john
into:
Field | Value |
---|---|
_splitstring[0] | image.png |
_splitstring[1] | PNG |
_splitstring[2] | 0755 |
_splitstring[3] | john |
Note
Special characters (including asterisk and pipe) also need to be escaped.
splitString()
Examples
Click
next to an example below to get the full details.Deduplicate Compound Field Data With array:union()
and split()
Query
splitString(field=userAgent,by=" ",as=agents)
|array:filter(array="agents[]", function={bname=/\//}, var="bname")
|array:union(array=agents,as=browsers)
| split(browsers)
Introduction
Deduplicating fields of information where there are multiple occurrences
of a value in a single field, maybe separated by a single character can
be achieved in a variety of ways. This solution uses
array:union()
and
split
create a unique array and
then split the content out to a unique list.
For example, when examining the humio and looking for the
browsers or user agents that have used your instance, the
UserAgent
data will contain the
browser and toolkits used to support them, for example:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 |
The actual names are the
Name/Version
pairs showing
compatibility with different browser standards. Resolving this into a
simplified list requires splitting up the list, simplifying (to remove
duplicates), filtering, and then summarizing the final list.
Step-by-Step
Starting with the source repository events.
- logscale
splitString(field=userAgent,by=" ",as=agents)
First we split up the userAgent field using a call to
splitString()
and place the output into the array field agentsThis will create individual array entries into the agents array for each event:
agents[0] agents[1] agents[2] agents[3] agents[4] agents[5] agents[6] agents[7] agents[8] agents[9] agents[10] agents[11] agents[12] Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36 - logscale
|array:filter(array="agents[]", function={bname=/\//}, var="bname")
- logscale
|array:union(array=agents,as=browsers)
Using
array:union()
we aggregate the list of user agents across all the events to create a list of unique entries. This will eliminate duplicates where the value of the user agent is the same value.The event data now looks like this:
browsers[0] browsers[1] browsers[2] Gecko/20100101 Safari/537.36 AppleWebKit/605.1.15 An array of the individual values.
- logscale
| split(browsers)
Using the
split()
will split the array into individual events, turning:browsers[0] browsers[1] browsers[2] Gecko/20100101 Safari/537.36 AppleWebKit/605.1.15 into:
_index row[1] 0 Gecko/20100101 1 Safari/537.36 2 AppleWebKit/605.1.15 Event Result set.
Summary and Results
The resulting output from the query is a list of events with each event containing a matching _index and browser. This can be useful if you want to perform further processing on a list of events rather than an array of values.