View Single Post
  #1 Old 08-31-2013, 04:53 PM
Cartographer
 
Cartographer's Avatar
 
Join Date: Aug 2013
Posts: 511
Cartographer is on a distinguished road
Default Extract e-mails BIG DB dump file

CREDITS TO AUTHOR



LetZ say you got some 2 GB .sql DB dump file and you're only interested in getting users e-mail from it. What's the best way to do it?


1. Find out DB's structure
Since you don't need whole DB, it will save your time & server load if you work only with user table from this point on.

First, you must somehow get DB's structure. In order to do that, use grep:
Code:
grep .Table structure. somesiteDB.sql | cut -d\` -f2 > dbstruct.txt
Now open dbstruct.txt and search for user table (_user, users, _members, etc.) - your file structure will look similar to (this is vb DB structure):
Code:
.
.
.
prefix_user
prefix_useractivation
.
.
.
So you found your user table (prefix_user), but you'll need to write down the following one too (prefix_useractivation) because you'll need it in next command.


2. Extract user table
We'll use sed to do it (be careful where you'll put prefix_user and prefix_useractivation and don't change anything else!):
Code:
sed -ne "/- Table structure for table .prefix_user./,/- Table structure for table .prefix_useractivation./p" somesiteDB.sql > usertable.sql
Basically, you're copying everything between those two strings (prefix_user and prefix_useractivation) using that command ^^^...


3. Extract e-mails
OK, this is the last and easiest step - perl script should do it just fine:
Code:
perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' usertable.sql | sort -u > emails.txt
Cartographer is offline   Reply With Quote