python - Regex for name extraction on text file -
python - Regex for name extraction on text file -
i've got plain text file containing list of authors , abstracts , i'm trying extract author names utilize network analysis. text follows pattern , contains 500+ abstracts:
2010 - nuclear forensics of special nuclear material @ los alamos: 3 recent studies purchase article david l. gallimore, los alamos national laboratory katherine garduno, los alamos national laboratory russell c. keller, los alamos national laboratory nuclear forensics of special nuclear materials highly specialized field because there few analytical laboratories in world can safely handle nuclear materials, perform high accuracy , precision analysis using validated analytical methods.
i'm using python 2.7.6 re library.
i've tried
regex = re.compile(r'( [a-z][a-z]*,+)') print regex.findall(text)
which pulls out lastly names only, plus capitalized words prior commas in abstracts.
using (r'.*,')
works extract total name, grabs entire abstract don't need.
maybe regex wrong approach? help or ideas appreciated.
if trying match names, seek match entire substring instead of part of it.
you utilize next regular look , modify if needed.
>>> regex = re.compile(r'\b([a-z][a-z]+(?: [a-z]\.)? [a-z][a-z]+),') >>> print regex.findall(text) ['david l. gallimore', 'katherine garduno', 'russell c. keller']
working demo | explanation
python regex
Comments
Post a Comment