python - Regex to split a sentence into simple English words -
i have sentence , wish extract words it. define word [a-za-z] word may contain apostrophe. apostrophe on own not word. programming python3.
input text:
don't-thread 0 '' ' 'on \r\nme!
should give:
don't thread on me
with regard regex splitting. translate follows using python:
don't -> dont thread -> thread on -> on me -> me
more input:
''kay', said. 'what're goin' do?'
regex split , python translation should give:
''kay' -> kay -> said -> said 'what're -> whatre -> goin' -> going -> ->
here's use:
\b(\s+)\b
which matches lot more i'm interested in.
update:
words can begin apostrophe. such "get 'em!"
you can try regex:
[a-za-z]+(?:'[a-za-z]+)*
which should work on regex engines. of groups can shortened depending on specifics of regex engine, that's more general regex.
makes sure apostrophe surrounded letters.
edit: allow initial apostrophes, can add '?
@ beginning:
'?[a-za-z]+(?:'[a-za-z]+)*
Comments
Post a Comment