How To Split Strings in Java

The string split() method in Java splits a given string around matches of the given regular expression. Example Java StringTokenizer, and String Split. The StringTokenizer class allows us to break a string into tokens in an application. This class is a legacy class retained for purposes of consistency although its use is discouraged in new code.

Splitting Strings

You can split a string between a particular delimiting character or a Regular Expression, you can use the String.split() method that has the following signature:

public String[] split(String regex)

Note that delimiting character or regular expression gets removed from the resulting String Array.

Example using delimiting character:

String lineFromCsvFile = "Mickey;Bolton;12345;121216";
String[] dataCells = lineFromCsvFile.split(";");
// Result is dataCells = { "Mickey", "Bolton", "12345", "121216"};

Example using regular expression:

String lineFromInput = "What do you need from me?";
String[] words = lineFromInput.split("\s+"); // one or more space chars
// Result is words = {"What", "do", "you", "need", "from", "me?"};

You can even directly split a String literal:

String[] firstNames = "Mickey, Frank, Alicia, Tom".split(", ");
// Result is firstNames = {"Mickey", "Frank", "Alicia", "Tom"};

Warning: Do not forget that the parameter is always treated as a regular expression.

"aaa.bbb".split("."); // This returns an empty array

In the previous example . is treated as the regular expression wildcard that matches any character, and since every character is a delimiter, the result is an empty array.

Splitting based on a delimiter which is a regex meta-character

The following characters are considered special (aka meta-characters) in regex

< > - = ! ( ) [ ] { } \ ^ $ | ? * + .

To split a string based on one of the above delimiters, you need to either escape them using \ or use Pattern.quote():

  • Using Pattern.quote():
String s = "a|b|c";
String regex = Pattern.quote("|");
String[] arr = s.split(regex);
  • Escaping the special characters:
String s = "a|b|c";
String[] arr = s.split("\|");

Split removes empty values

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need
to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\|", -1);
split(regex) internally returns result of split(regex, 0).

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
If the limit n is greater than zero then the pattern will be applied at most n – 1 time, the array’s length will be no greater than n, and the array’s last entry will contain all input beyond the last matched delimiter.
If n is negative, then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Splitting with a StringTokenizer

Besides the split() method Strings can also be split using a StringTokenizer.

StringTokenizer is even more restrictive than String.split(), and also a bit harder to use. It is essentially designed for pulling out tokens delimited by a fixed set of characters (given as a String). Each character will act as a
separator. Because of this restriction, it’s about twice as fast as String.split().

Default set of characters are empty spaces (\t\n\r\f). The following example will print out each word separately

String str = "the lazy fox jumped over the brown fence";
StringTokenizer tokenizer = new StringTokenizer(str);
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}

This will print out:

the
lazy
fox
jumped
over
the
brown
fence

You can use different character sets for separation.

String str = "jumped over";
// In this case character u and e will be used as delimiters
StringTokenizer tokenizer = new StringTokenizer(str, "ue");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}

This will print out:

j
mp
d ov
r

Joining Strings with a delimiter

Version ≥ Java SE 8

An array of strings can be joined using the static method String.join():

String[] elements = { "foo", "bar", "foobar" };
String singleString = String.join(" + ", elements);
System.out.println(singleString); // Prints "foo + bar + foobar"

Similarly, there’s an overloaded String.join() method for Iterables.

To have a fine-grained control over joining, you may use StringJoiner class:

StringJoiner sj = new StringJoiner(", ", "[", "]");
// The last two arguments are optional,
// they define prefix and suffix for the result string
sj.add("foo");
sj.add("bar");
sj.add("foobar");
System.out.println(sj); // Prints "[foo, bar, foobar]"

To join a stream of strings, you may use the joining collector:

Stream stringStream = Stream.of("foo", "bar", "foobar");
String joined = stringStream.collect(Collectors.joining(", "));
System.out.println(joined); // Prints "foo, bar, foobar"

There’s an option to define prefix and suffix here as well:

Stream stringStream = Stream.of("foo", "bar", "foobar");
String joined = stringStream.collect(Collectors.joining(", ", "{", "}"));
System.out.println(joined); // Prints "{foo, bar, foobar}"

String concatenation and StringBuilders

String concatenation can be performed using the + operator. For example:

String s1 = "a";
String s2 = "b";
String s3 = "c";
String s = s1 + s2 + s3; // abc

Normally a compiler implementation will perform the above concatenation using methods involving a StringBuilder under the hood. When compiled, the code would look similar to the below:

StringBuilder sb = new StringBuilder("a");
String s = sb.append("b").append("c").toString();

StringBuilder has several overloaded methods for appending different types, for example, to append an int instead of a String. For example, an implementation can convert:

String s1 = "a";
String s2 = "b";
String s = s1 + s2 + 2; // ab2

to the following:

StringBuilder sb = new StringBuilder(“a”);
String s = sb.append(“b”).append(2).toString();

The above examples illustrate a simple concatenation operation that is effectively done in a single place in the code. The concatenation involves a single instance of the StringBuilder. In some cases, a concatenation is carried out in a cumulative way such as in a loop:

String result = "";
for(int i = 0; i < array.length; i++) {
result += extractElement(array[i]);
}
return result;

In such cases, the compiler optimization is usually not applied, and each iteration will create a new StringBuilder object. This can be optimized by explicitly transforming the code to use a single StringBuilder:

StringBuilder result = new StringBuilder();
for(int i = 0; i < array.length; i++) {
result.append(extractElement(array[i]));
}
return result.toString();

A StringBuilder will be initialized with an empty space of only 16 characters. If you know in advance that you will be building larger strings, it can be beneficial to initialize it with sufficient size in advance, so that the internal buffer does not need to be resized:

StringBuilder buf = new StringBuilder(30); // Default is 16 characters
buf.append("0123456789");
buf.append("0123456789");                 // Would cause a reallocation of the internal buffer otherwise
String result = buf.toString();          // Produces a 20-chars copy of the string

If you are producing many strings, it is advisable to reuse StringBuilders:

StringBuilder buf = new StringBuilder(100);
for (int i = 0; i < 100; i++) {
buf.setLength(0);     // Empty buffer
buf.append("This is line ").append(i).append('\n');
outputfile.write(buf.toString());
}

If (and only if) multiple threads are writing to the same buffer, use StringBuffer, which is a synchronized version of StringBuilder. But because usually, only a single thread writes to a buffer, it is usually faster to use StringBuilder without synchronization.

Using concat() method:

String string1 = "Hello ";
String string2 = "world";
String string3 = string1.concat(string2); // "Hello world"

This returns a new string that is string1 with string2 added to it at the end. You can also use the Concat() method with string literals, as in:

"My name is ".concat("Buyya");

Leave a Comment