python - Regex for name extraction on text file -



python - Regex for name extraction on text file -

i've got plain text file containing list of authors , abstracts , i'm trying extract author names utilize network analysis. text follows pattern , contains 500+ abstracts:

2010 - nuclear forensics of special nuclear material @ los alamos: 3 recent studies purchase article david l. gallimore, los alamos national laboratory katherine garduno, los alamos national laboratory russell c. keller, los alamos national laboratory nuclear forensics of special nuclear materials highly specialized field because there few analytical laboratories in world can safely handle nuclear materials, perform high accuracy , precision analysis using validated analytical methods.

i'm using python 2.7.6 re library.

i've tried

regex = re.compile(r'( [a-z][a-z]*,+)') print regex.findall(text)

which pulls out lastly names only, plus capitalized words prior commas in abstracts.

using (r'.*,') works extract total name, grabs entire abstract don't need.

maybe regex wrong approach? help or ideas appreciated.

if trying match names, seek match entire substring instead of part of it.

you utilize next regular look , modify if needed.

>>> regex = re.compile(r'\b([a-z][a-z]+(?: [a-z]\.)? [a-z][a-z]+),') >>> print regex.findall(text) ['david l. gallimore', 'katherine garduno', 'russell c. keller']

working demo | explanation

python regex

Comments

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -