Extracting Text From Variable Length Strings

After I wrote about how to extract characters from the left, right or middle of a text string in Excel, I received a few inquiries about extracting text from strings which don’t seem to have a fixed size, and vary in length.

For example, if the data is something like this

HQ-1022-PORT
LONDON-4053-LANDED
HOUSTON-2488-WEST
SINGAPORE-3133-LEEDON

You want to separate out the first word (HQ, LONDON, HOUSTON, SINGAPORE etc.)

Can’t Use LEFT Function
You can’t use a LEFT function, because a left function needs to know the number of characters you want to extract. And this itself is variable… There are 2 characters in HQ, 6 in London, 7 in Houston etc.

One of the ways I have discovered is to find the location of the first Hyphen (-), and extract all characters from the left of it.

Want to Improve Your Excel Skills?

Learn the Key Features of Excel Quickly & Easily, by Joining the
Online Training on Basic / Intermediate Excel.

To Get Most out of Excel, Learn the  Pivot Table techniques in our
Pivot Table Masterclass Training
!


Finding the location of the First Hyphen
We can use the FIND function to find the location of the first hyphen.

If A2 contains HQ-1022-PORT, we can use the formula as:

=FIND(“-“,A2)

The answer would be 3. This means that the hyphen is the third character in the string. This is perfect. Now we know that we need n-1, that means 3-1, which is 2. We need 2 characters from the Left of this number.

So we can write another formula as

=LEFT(A2,  FIND(“-“,A2)-1 )

The resulting answer would be HQ. Copy the formula down to other cells, and you should be able to extract HOUSTON, SINGAPORE etc. without much hassle, and without having to write another, different formula.

One size fits all formula to extract a variable length string from a piece of text. And pretty easily too!

Do you have any other method? How would you solve this challenge?

Please post your answers in the comments below.

Cheers,
Vinai Prakash

P.S. – I am teaching a 2 day course on Data Interpretation & Analysis in Singapore on Feb 9-10, 2015, and another on March 9-10, 2015. If you would like to learn techniques to analyze data and create management reports, you can attend this program.

Awaiting you and your innovative comments below … 🙂 – Vinai

Are you facing any problem in using Excel? Any Question?

You have come to the right place. Tell us your needs. We’ll be glad to help you!

3 thoughts on “Extracting Text From Variable Length Strings”

  1. I have many cells that all very in length. I want to pull text AFTER a certain point (mid function??) but also omit text at the end (conveniently, the last 4 characters which in my case is “.pdf”.). I know there must be a relatively simple way to do this but whats throwing me off is the varying lengths of the middle text in each cell that I want to extract!

    An example cell:

    bv-1-25-system bv-tr-mt1-43-ats-2 rev1.pdf

    So, I want everything AFTER the word ‘system’ but everything BEFORE ‘.pdf’. Which I can make happen with this formula: =MID(A1,FIND(“SYSTEM”,A1)+7,100) BUT I can’t seem to get it to cut off the ‘.pdf’ portion. Again keeping in mind that every cell varies in length…

    Any help with this?

    Reply
    • hi Keegan,

      You are unable to get it with your formula because you are picking the next 100 characters after the word system.

      To achieve what you need, try this formula
      =MID(A1,FIND(“system”,A1)+7,LEN(A1)-4-FIND(“system”,A1)-6)

      It will give you bv-tr-mt1-43-ats-2 rev1 as the output.

      Hope this helps…
      Cheers,
      Vinai

      Reply
      • Vinai – Thank you so much! This is exactly what I was looking for! I knew I shouldn’t have been using the the +100 but couldn’t figure out how to get to this! So thanks again…

        Now, for my next question! It turns out I’ve come across some files that don’t have the “rev1” bit and i’ve decided it’d be better to omit this part all together as well. so for SOME files simply omitting the ‘.pdf’ portion will be fine, but others will require removing ‘_rev1.pdf’.

        Any insight on this?

        Thanks again in advance, and thanks for the super quick response to my original question!

        Reply

Leave a Comment