python - Regex to split a sentence into simple English words -
i have sentence , wish extract words it. define word [a-za-z] word may contain apostrophe. apostrophe on own not word. programming python3.
input text:
don't-thread 0 '' ' 'on \r\nme! should give:
don't thread on me with regard regex splitting. translate follows using python:
don't -> dont thread -> thread on -> on me -> me more input:
''kay', said. 'what're goin' do?' regex split , python translation should give:
''kay' -> kay -> said -> said 'what're -> whatre -> goin' -> going -> -> here's use:
\b(\s+)\b which matches lot more i'm interested in.
update:
words can begin apostrophe. such "get 'em!"
you can try regex:
[a-za-z]+(?:'[a-za-z]+)* which should work on regex engines. of groups can shortened depending on specifics of regex engine, that's more general regex.
makes sure apostrophe surrounded letters.
edit: allow initial apostrophes, can add '? @ beginning:
'?[a-za-z]+(?:'[a-za-z]+)*
Comments
Post a Comment