Substrings in Java

A segment of the string is called substring. To put it another way, substring is a subset of another string. StartIndex is inclusive and endIndex is exclusive when substring.

Substrings

String s = "this is an example";
String a = s.substring(11);              // a will hold the string starting at character 11 until the end
("example")
String b = s.substring(5, 10);           // b will hold the string starting at character 5 and ending right
before character 10 ("is an")
String b = s.substring(5, b.length()-3); // b will hold the string starting at character 5 ending
right before b' s lenght is out of 3 ("is an exam")

Substrings may also be applied to slice and add/replace character into its original String. For instance, you faced a Chinese date containing Chinese characters but you want to store it as a well format Date String.

String datestring = "2015年11月17日"
datestring = datestring.substring(0, 4) + "-" + datestring.substring(5,7) + "-" +
datestring.substring(8,10);
//Result will be 2015-11-17

The substring method extracts a piece of a String. When provided one parameter, the parameter is the start and the piece extends until the end of the String. When given two parameters, the first parameter is the starting character and the second parameter is the index of the character right after the end (the character at the index is not included). An easy way to check is the subtraction of the first parameter from the second should yield the expected length of the string.

Version < Java SE 7

In JDK <7u6 versions the substring method instantiates a String that shares the same backing char[] as the original String and has the internal offset and count fields set to the result start and length. Such sharing may cause memory leaks, that can be prevented by calling new String(s.substring(…)) to force creation of a copy, after which the char[] can be garbage collected.

Version ≥ Java SE 7

Related Article: Strings in Java

Some of the use cases of CNN include Image processing, Medical image analysis, Natural language-processing tasks, Video recognition, Pattern recognition, Recommendation engines, and more.

From JDK 7u6 the substring method always copies the entire underlying char[] array, making the complexity linear compared to the previous constant one but guaranteeing the absence of memory leaks at the same time.

Platform independent new line separator

Since the new line separator varies from platform to platform (e.g. \n on Unix-like systems or \r\n on Windows) it is often necessary to have a platform-independent way of accessing it. In Java it can be retrieved from a system property:

System.getProperty("line.separator")
Version ≥ Java SE 7

Because the new line separator is so commonly needed, from Java 7 on a shortcut method returning exactly the same result as the code above is available:

System.lineSeparator()

Note: Since it is very unlikely that the new line separator changes during the program’s execution, it is a good idea to store it in a static final variable instead of retrieving it from the system property every time it is needed.

When using String.format, use %n rather than \n or ‘\r\n’ to output a platform independent new line separator.

System.out.println(String.format('line 1: %s.%nline 2: %s%n', lines[0],lines[1]));

Reversing Strings

There are a couple ways you can reverse a string to make it backwards.

  1. StringBuilder/StringBuffer:
String code = "code";
System.out.println(code);
StringBuilder sb = new StringBuilder(code);
code = sb.reverse().toString();
System.out.println(code);
  1. Char array:
String code = "code";
System.out.println(code);
char[] array = code.toCharArray();
for (int index = 0, mirroredIndex = array.length - 1; index < mirroredIndex; index++, mirroredIndex--) {
     char temp = array[index];
     array[index] = array[mirroredIndex];
     array[mirroredIndex] = temp;
}
// print reversed
System.out.println(new String(array));

Adding toString() method for custom objects

Suppose you have defined the following Person class:

public class Person {
    String name;
    int age;
    public Person (int age, String name) {
        this.age = age;
        this.name = name;
       }
}

If you instantiate a new Person object:

Person person = new Person(25, "John");

and later in your code you use the following statement in order to print the object:

System.out.println(person.toString());

you’ll get an output similar to the following:

Person@7ab89d

This is the result of the implementation of the toString() method defined in the Object class, a superclass of Person. The documentation of Object.toString() states:

The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@’, and the unsigned hexadecimal representation of the hash code of the object. In other words, this method returns a string equal to the value of:

getClass().getName() + '@' + Integer.toHexString(hashCode())

So, for meaningful output, you’ll have to override the toString() method:

@Override
public String toString() {
      return "My name is " + this.name + " and my age is " + this.age;
}

Now the output will be:

My name is John and my age is 25

You can also write

System.out.println(person);

In fact, println() implicitly invokes the toString method on the object.

Remove Whitespace from the Beginning and End of a String

The trim() method returns a new String with the leading and trailing whitespace removed.

String s = new String(" Hello World!! ");
String t = s.trim(); // t = "Hello World!!"

If you trim a String that doesn’t have any whitespace to remove, you will be returned the same String instance.

Note that the trim() method has its own notion of whitespace, which differs from the notion used by the Character.isWhitespace() method:

  • All ASCII control characters with codes U+0000 to U+0020 are considered whitespace and are removed by trim(). This includes U+0020 ‘SPACE’, U+0009 ‘CHARACTER TABULATION’, U+000A ‘LINE FEED’ and U+000D ‘CARRIAGE RETURN’ characters, but also the characters like U+0007 ‘BELL’.
  • Unicode whitespace like U+00A0 ‘NO-BREAK SPACE’ or U+2003 ‘EM SPACE’ are not recognized by trim().

Leave a Comment