Python | Split multiple characters from string
While coding or improvising your programming skill, you surely must have come across many scenarios where you wished to use .split() in Python not to split on only one character but multiple characters at once.
Example:
"GeeksforGeeks, is an-awesome! website"
Using .split() on the above will result in
['GeeksforGeeks, ', 'is', 'an-awesome!', 'website']
whereas the desired result should be
['GeeksforGeeks', 'is', 'an', 'awesome', 'website']
In this article, we will look at some ways in which we can achieve the same.
Method 1: Split multiple characters from string using re.split()
This is the most efficient and commonly used method to split multiple characters at once. It makes use of regex(regular expressions) in order to do this.
Python3
import re # initializing string data = "GeeksforGeeks, is_an-awesome ! website" # printing original string print ( "The original string is : " + data) # Using re.split() # Splitting characters in String res = re.split( ', |_|-|!' , data) # printing result print ( "The list after performing split functionality : " + str (res)) |
Output:
The original string is : GeeksforGeeks, is_an-awesome ! website The list after performing split functionality : [‘GeeksforGeeks’, ‘is’, ‘an’, ‘awesome ‘, ‘ website’]
The line re.split(‘, |_|-|!’, data) tells Python to split the variable data on the characters: , or _ or – or !. The symbol “|” represents or. There are some symbols in regex which are treated as special symbols and have different functions. If you wish to split on such a symbol, you need to escape it using a “\“(back-slash) and it needs one space before and after special characters.
List of special characters that need to be escaped before using them:
. \ + * ? [ ^ ] $ ( ) { } = | :
Example:
Python3
import re newData1 = "GeeksforGeeks, is_an-awesome ! app + too" # To split "+" with one espace before and after "+" symbol and use backslash print (re.split( ', |_|-|!|\+' , newData1)) newData2 = "GeeksforGeeks, is_an-awesome ! app+too" # To split "+" without one espace before and after "+" symbol and use backslash print (re.split( ', |_|-|!|\+' , newData2)) |
Output:
['GeeksforGeeks', ' is', 'an', 'awesome', ' app', 'too']
Note: To know more about regex click here.
Method 2: Split multiple characters from a string using re.findall()
This is a bit more arcane form but saves time. It also makes use of regex like above but instead of .split() method, it uses a method called .findall(). This method finds all the matching instances and returns each of them in a list. This way of splitting is best used when you don’t know the exact characters you want to split upon.
Python3
import re # initializing string data = "This, is - another : example?!" # printing original string print ( "The original string is : " + data) # Using re.findall() # Splitting characters in String res = re.findall(r "[\w']+" , data) # printing result print ( "The list after performing split functionality : " + str (res)) |
Output:
The original string is : This, is – another : example?! The list after performing split functionality : [‘This’, ‘is’, ‘another’, ‘example’]
Here the keyword [\w’]+ indicates that it will find all the instances of alphabets or underscore(_) one or more and return them in a list. Note: [\w’]+ won’t split upon an underscore(_) as it searches for alphabets as well as underscores.
Example:
Python3
import re testData = "This, is - underscored _ example?!" print (re.findall(r "[\w']+" , testData)) |
Output:
['This', 'is', 'underscored', '_', 'example']
Split multiple characters from a string using replace() and split()
This is a very rookie way of doing the split. It does not make use of regex and is inefficient but still worth a try. If you know the characters you want to split upon, just replace them with a space and then use .split():
Python3
# Initial string data = "Let's_try, this now" # printing original string print ( "The original string is : " + data) # Using replace() and split() # Splitting characters in String res = data.replace( '_' , ' ' ).replace( ', ' , ' ' ).split() # Printing result print ( "The list after performing split functionality : " + str (res)) |
Output:
The original string is : Let’s_try, this now The list after performing split functionality : [“Let’s”, ‘try’, ‘this’, ‘now’]
Character Classes
Regex cheat-sheet on character description
Shorthand character class | Represents |
---|---|
\d | Any numeric digit from 0 to 9 |
\D | Any character that is not a numeric digit from 0 to 9 |
\w | Any letter, numeric digit, or the underscore character |
\W | Any character that is not a letter, numeric digit, or the underscore character |
\s | Any space, tab, or newline character |
\S | Any character that is not a space, tab, or newline |
Please Login to comment...