Pages

Saturday, August 27, 2016

PowerShell v3 in a Year Day 8 about Split

I did not discover the Split operator for far too long in my workings with PowerShell. I had used similar things in Javascript, C#, Perl and VBScript, but, didnt know about this one. Man, was I missing the boat. In PowerShell, the general syntax is to place whatever you want to split on the left hand side of the operator, the -split operator in the middle and the delimiter on the right hand side of the operator as below:

"Lastname:FirstName:Address"-split "(:)" 
When you run this it returns five values:

Lastname
:
FirstName
:
Address 
The delimited is a character (or characters) that identify the end of the substring. The default is whitespace, which includes spaces and control characters, such as new line (`n) and tab (`t). Normally, the delimiter is omitted from the result set. To preserve part (or all) of it, enclose the part you want to retain in parentheses.

For example, here is an instance where the delimiter is omitted:
test1234test1234test123 -split test
Note that I am using a multi-character delimiter. It is good to be aware that you are not limited to single-character delimiters. This outputs the following:

1234
1234
123
Now, here is an example of where you retain the delimiter, test, by placing it in quotes:
 test1234test1234test123 -split (test)
and its output:

test
1234
test
1234
test
123
 NOTE: When providing an initial definition for -split I said, "In PowerShell, the general syntax is to place whatever you want to split on the left hand side of the operator, the -split operator in the middle and the delimiter on the right hand side of the operator". What I want to emphasize here is the condition in general. The operator has a second use where you can specify the maximum number of substrings after the delimiter.

For example, taking our previous example, let us say we only wanted three substrings for a space-delimited list of letters, a b c d e f, how would we do that in PowerShell? Like this,
a b c d e f -split ,3
When you run this in PowerShell it returns this:
a
b
c de f
What is happening is PowerShell tokenizes the string, and, pops 1 (a), 2(b) and 3 (c d e f) character groups for the result. This is useful when you need to have a fixed number of results but cant control the input. Note that values of 0 or negative integers will return the full, original substring.

One interesting use of the right hand side of the operator is to pass to it script blocks. For example, taking one straight from the help, we want to split if the character is an e or a p.
$c = "Mercury,Venus,Earth,Mars,Jupiter,Saturn,Uranus,Neptune"
$c -split{$_ -eq"e" -or$_ -eq"p"}
When it is run, it returns the following:
M
rcury,V
nus,Earth,Mars,Ju
it
r,Saturn,Uranus,N

tun
This can be very helpful if you have very specific conditions you need to meet, and/or criteria to match against. Additionally, you can perform dynamic analysis, i.e., calculations, with scriptblocks.

Another powerful aspect of the -split operator is its option set. At present there are two sets, each with a few suboptions, to pay attention to. I have included a list of these below.

  1. SimpleMatch: Use simple string comparison when evaluating the delimiter. Cannot be used with RegexMatch.
    1. IgnoreCase: Forces case-insensitive matching, even if the -cSplit operator is specified.
  2. RegexMatch: Use regular expression matching to evaluate the delimiter. This is the default behavior. Cannot be used with SimpleMatch.
    1. IgnoreCase: Forces case-insensitive matching, even if the -cSplit operator is specified.
    2. CultureInvariant: Ignores cultural differences in language when evaluting the delimiter. Valid only with RegexMatch.
    3. IgnorePatternWhitespace: Ignores unescaped whitespace and comments marked with the number sign (#). Valid only with RegexMatch.
    4. ExplicitCapture: Ignores non-named match groups so that only explicit capture groups are returned in the result list. Valid only with RegexMatch.
    5. SingleLine: Singleline mode recognizes only the start and end of strings. Valid only with RegexMatch. Singleline is the default.
    6. MultiLine: Multiline mode recognizes the start and end of lines and strings. Valid only with RegexMatch. Singleline is the default.
These rules are helpful when dealing with edge cases that require a little extra TLC to properly manage. The syntax for these is to include a third comma separate value after the max substring indicator. For example, the help examples provide a here string for which we use a regex with the multiline option. The here-string is:
$a = @
1The first line.
2The second line.
3The third of three lines.
@
The syntax for handling this with -Split looks like this:
 $a -split"^d",0, "multiline"
All we are trying to do with this regex is to return all substrings (with a maxsubstrings value of 0) and an option of "multiline". When it is run, it returns this:

The firstline.

The secondline.

The thirdof threelines.
The fourth key point about using -Split involves recognizing a difference between unary and binary splitting. Unary splitting, where there is no left hand side to the statement, has a higher precedence order than a binary split operator, where there are both left and right hand sides to a statement. To explain this, let us look at this example. For example, if we evaluate the following statement,
-split "1 2", "a b"
It returns this
1
2
 a b
Since my first exposure to -Split was the binary operator, this was a little confusing to me first. It seemed, in my initial understanding (binary operator only) you had to have both a Left Hand Side (LHS) and a Right Hand Side (RHS) to form a complete statement. For binary operation, this is true, but, not for unary operation.

As noted earlier, when PowerShell encounters a -Split operator, the first thing it will do is try to evaluate it as a unary operator where only a RHS is expected. If that fails, it will then fall through to the second condition, binary operation, and, PowerShell will then attempt to evaluation the statement as having both a LHS and a RHS. It is important to keep this in mind if you ever run into odd behavior with the -Split operator. Glance at your LHS and RHS to be sure they make sense and no lexical errors could be forcing the wrong mode (unary instead of binary or binary instead of unary) of operator to be evaluated.

When working with binary operators in particular, you can force a set of strings to be evaluated by wrapping them in parentheses on the RHS, such as,
(1 2, 3 4) -split ,
which evaluates to
1 2
3
Here are some other results. Splitting on nothing, (1 2, 3 4) -split , gives you,

1

2


Related Posts by Categories

0 comments:

Post a Comment