python - Regex to split a sentence into simple English words -


i have sentence , wish extract words it. define word [a-za-z] word may contain apostrophe. apostrophe on own not word. programming python3.

input text:

don't-thread 0 '' ' 'on \r\nme! 

should give:

don't thread on  me  

with regard regex splitting. translate follows using python:

don't -> dont  thread -> thread on -> on me -> me 

more input:

   ''kay', said. 'what're goin' do?' 

regex split , python translation should give:

   ''kay' -> kay     ->    said -> said    'what're -> whatre    ->    goin' -> going    ->    -> 

here's use:

\b(\s+)\b 

which matches lot more i'm interested in.

update:

words can begin apostrophe. such "get 'em!"

you can try regex:

[a-za-z]+(?:'[a-za-z]+)* 

which should work on regex engines. of groups can shortened depending on specifics of regex engine, that's more general regex.

makes sure apostrophe surrounded letters.

edit: allow initial apostrophes, can add '? @ beginning:

'?[a-za-z]+(?:'[a-za-z]+)* 

regex101 demo


Comments

Popular posts from this blog

c# - How Configure Devart dotConnect for SQLite Code First? -

c++ - Clear the memory after returning a vector in a function -

erlang - Saving a digraph to mnesia is hindered because of its side-effects -