I always believe SAS and Python can make a great team together for Data Scientists. So Why Now we study them together. In this Post we will discuss various String functions in SAS and Python.
String Functions in Python 3
Hope you must have gone through my previous post “Python Programming- Strings explained“.
len(): returns the length of a string (number of characters)
str(): returns the string representation of an object
int(): given a string or number, returns an integer
aila= “i hate reading”
There are many string operation methods in Python that can be used to manipulate the data. We are going to use some basic string operations on the data.
The method upper; this method converts upper case characters to lower case characters
A=”Thriller is the sixth studio album”
How to Search for Substrings
string.find(s, sub[, start[, end]])
Return the lowest index in s where the substring sub is found such that sub is wholly contained in s[start:end]. Return -1 on failure.
Defaults for startand end and interpretation of negative values is the same as for slices.
How to make Strings Upper and Lower Case
str.lower() will return a string with all the letters of an original string converted to upper- or lower-case letters. Because strings are immutable data types, the returned string will be a new string.
How to use join(), split(), and replace() Methods
str.replace() methods are a few additional ways to manipulate strings in Python.
str.join() method will concatenate two strings, but in a way that passes one string through another.
Let’s create a string:
balloon = "Sammy has a balloon."
Now, let’s use the
str.join() method to add whitespace to that string, which we can do like so:
If we print this out:
We will see that in the new string that is returned there is added space throughout the first string:
S a m m y h a s a b a l l o o n .
We can also use the
str.join() method to return a string that is a reversal from the original string:
.noollab a sah ymmaS
We did not want to add any part of another string to the first string, so we kept the quotation marks touching with no space in between.
str.join() method is also useful to combine a list of strings into a new single string.
Let’s create a comma-separated string from a list of strings:
print(",".join(["sharks", "crustaceans", "plankton"]))
If we want to add a comma and a space between string values in our new string, we can simply rewrite our expression with a whitespace after the comma:
", ".join(["sharks", "crustaceans", "plankton"]).
Just as we can join strings together, we can also split strings up. To do this, we will use the
['Sammy', 'has', 'a', 'balloon.']
str.split() method returns a list of strings that are separated by whitespace if no other parameter is given.
We can also use
str.split() to remove certain parts of an original string. For example, let’s remove the letter
a from the string:
['S', 'mmy h', 's ', ' b', 'lloon.']
Now the letter
a has been removed and the strings have been separated where each instance of the letter
a had been, with whitespace retained.
str.replace() method can take an original string and return an updated string with some replacement.
Let’s say that the balloon that Sammy had is lost. Since Sammy no longer has this balloon, we will change the substring
"has" from the original string
"had" in a new string:
Within the parentheses, the first substring is what we want to be replaced, and the second substring is what we are replacing that first substring with. Our output will look like this:
Sammy had a balloon.
Using the string methods
str.replace() will provide you with greater control to manipulate strings in Python.
SAS String / Character Functions quick revision
- use a LENGTH statement to set the desired length of a character variable
- use the concatenation operator (| |) to join two or more character strings
- use the COMPBL function to convert multiple blanks in a character string to a single blank
- use the COMPRESS function to remove characters from a string
- use the VERIFY function to check that certain values are present in a character variable
- use the TRIM function to remove the trailing blanks from a character string
- use the SUBSTR function to select a subset of consecutive characters from a larger string
- use the SUBSTR function on the left-hand side of an equal sign
- use the SUBSTR function to unpack a string of characters into its individual characters
- use the INPUT function to convert a character variable to a numeric variable
- use the PUT function to convert a numeric variable to a character variable
- use the SCAN function to parse a string and/or extract part of a string
- use the INDEX and INDEXC functions to locate a position of one string within another string
- use the UPCASE function to change lowercase letters to uppercase letters, and use the LOWCASE function to change uppercase letters to lowercase letters
- use the PROPCASE function to capitalize the first letter of each word
- use the TRANWRD function to translate a word
- use the CATS function to strip leading and trailing blanks before joining two or more strings
- use the CATX function to strip leading and trailing blanks, and then join two or more strings with a specified character inserted between the strings
- use the LENGTHC function to return the storage length of a character variable
- use the LENGTH and/or LENGTHN functions to determine the length of a character variable not counting trailing blanks
- use the COUNT function to count the number of times a particular substring appears in a string
- use the COUNTC function to count the number of times one or more characters appear in a string