Email Extractor


Advanced Options

Overview

The advanced options will constantly be evolving and changing. Any significant changed with the program will likely be made here. It’s not advised to simply experiment with the options. A full understanding is required in order to extract emails effectively. The options provided to users in this section allow for complete customization of extraction sessions.

Search Criteria

I briefly went over the meaning of Search Criteria and its role in the following entry within this guide:

What is Social Email Extractor? > How it works

I would advise you to quickly review that section to better understand the nature Search Criteria. Now that I think about it, it may have been more appropriate to call it SMTP Criteria rather than Search Criteria. Nevertheless, this is a critical component of SEE and extraction sessions. The initial version of SEE had what you would now see as the default list of Search Criteria’s. This evolved into other lists as users contributed their ideas. You can add SMTP providers – it doesn’t matter.

How to Find SMTP Providers
IMG

Choosing specific Search Criteria comes in handy when you want to zero-in on emails from a specific country. The current lists are more or less designed for extraction across North America. If the geographic nature of your lists isn’t an issue then selecting Combine All is probably your best bet for larger and more definitive lists. This method will, however, take much longer. If you’re a drug addict looking for a quick fix then you’ll receive large lists quickly by selecting the (or leaving) the default Search Criteria values. You must have some sort of Search Criteria in order to conduct an extraction.

General Interest Search Criteria

This type of Search Criteria list is designed to extract what is known as Mom and Pop emails. Think of major SMTP Providers and major companies. A Mom and Pop email address or General Interest email would be the old Video store down the road that is not a part of a chain of companies. While some users welcome Gmail, Yahoo and AOL email accounts others may want to focus in only on those that General Interest. Selecting this list of Search Criteria’s will assist in scraping emails from non-major SMTP providers. If this is your intent then you should use it in conjunction with the PSF function in addition to reading about removing the Domain name (both discussed later in this guide).

Intelligent Search Criteria

The Intelligent Search Criteria Option or ISC was created to further expand the return of emails scraped through SEE. Sometimes when users write emails they might write them in a non-traditional way in an attempt to prevent them from being scraped. For example:

johnsmith [at] gmail.com

Note: Do not attempt to alter, add or change the Intelligent Search Criteria unless you've read this guide and fully understand how it works.

By enabling ISC SEE is able to find and convert these email addresses into their proper format. The current default ISC list is as follows:

(at) gmail (dot) com
(at) yahoo (dot) com
(at) hotmail (dot) com

To understand how ISC works for individual 'ISC's' I'll explain what happens to the first one on the list - in this case '(at) gmail.com'.

SEE will scrape pages as it usually does however with the ISC selected it will search out and scrape pages specifically in the '(at) gmail.com' in conjunction with your keyword(s). Any pages returned will automatically be converted.

In this specific case SEE will convert the following to the '@' symbol: [at]
[AT]
(at)
(AT)
[ at ]
( at )
_at_
_AT_
at gmail.com
AT gmail.com

After the scraped pages have been converted and assuming some other internal 'SEE engine' criteria is fulfilled the emails will be saved. Similarly, dot should be used (by itself or conjunction with 'at').

All instances of "(dot)" (minus the quotes) will scrape pages containing the following within email addresses:

[dot] [DOT] (dot) (DOT) [ dot ] ( dot ) _dot_ _DOT_ DOT com dot com

...and change any instances noted above to "." (minus the quotes) respectively.

This in turn converts the 'hidden' email address into ones that are readable.

Creating Your Own ISC

WARNING: It's not necessary to include the variations above ([at], _at_, etc. or (dot), _DOT_, etc.) but rather only one instance of "(at)" (minus the quotes) and "(dot)" (minus the quotes) for any given search criteria that you add.

That's why you'll see only two instances for GMAIL:

(at) gmail (dot) com
and NOT:

[AT] gmail.com
(at) gmail.com
(AT) gmail.com
[ at ] gmail.com
( at ) gmail.com
_at_ gmail.com
_AT_ gmail.com
at gmail.com
AT gmail.com[/list]

That being said if you wanted to add "AOL" into the ISC you would simply add the following into the ISC box:
(at) AOL (dot) com

But that would be it. It would be unnecessary to add the following (combined with the above):
(at) AOL.com
_at_AOL.com
at_aol_dot_com
etc.
etc.

IMPORTANT: The ISC component of SEE only recognizes (at) and (dot) (case-sensitive). So doing any other variation will be pointless - the ISC component was designed to convert multiple variations (as noted above) based on only '(at)' or '(dot)' being in the ISC list.

These two elements will enable SEE to recognize them as an ISC and treat them as such during the extraction process. Because the Intelligent Search Criteria is relatively new the default list is short. During my initial trials it has been proven to be very effective. The default list will certainly grow in the future. In the mean time I encourage you to experiment with the ISC function while ensuring the guidelines above are maintained.

Pre-Scrape Filter Function

The Pre-scrape Filter function (PSF) is designed to occlude specific SMTP providers. It should be used in conjunction with your Search Criteria list. If you would like to add you own SMTP provider to occlude from scrapes then simply a hyphen followed by the domain like this example:

-gmail.com

Note: The PSF function does not guarantee that the SMTP provider will not appear within your email lists.

While most of the time the PSF function does an efficient job at not including specific domains users are encouraged to use the White List/Filter to further ensure the removal of specific domains.

A default list of major SMTP providers is provided, however, users can change this list as they see fit for extractions.

If you’re intending to use this feature with the General Interest Search Criteria List then you’ll also want to consider reading up on the Remove Domain Name in Search Results which you’ll find below.

Create Custom Filename

This topic was previously discussed in the following entry within this guide:

Quick Start Guide > Starting a Session

Where a filename is not inputted by the user the program will automatically create a filename based on the current date and time.

Note: Only use traditional alpha-numeric characters for creating filenames. Anything else will cause your lists to be inaccessible.

Geographically-based Emails

UPDATE: Review these two video tutorials if you would like more information on extracting from a specific country:

Extract Emails From Any Country - Social Email Extractor Advanced Tutorials
Extract Emails From Any Country (PART 2) - Social Email Extractor Advanced Tutorials

Selecting a country here is only one way to target emails from a specific country. IMG If you intend on scraping from a specific country then it would be wise to use a list Search Criteria’s that have SMTP providers from that country. Think about this for a second. If you use an SMTP provider based in and intended for users within USA then what relationship or relevance would that have against a different country. If you select a Search Criteria primarily based out the of states and search for that Search Criteria within another country then your results won’t be as promising compared to scraping pages from an SMTP provider hailing from the country scraping in question.

Note: Do not assume that simply selecting a country will ensure that every email address is from that country.

You can create highly-targeted email lists based on country, however, you will have to research the guide noted above on how to intelligently go about this. By default, the USA is the main region and the Search Criteria’s are primarily related to North America.

Maximum Email Count

The default email addresses you can extract within a session is 100,000. If you look at the history of email extractors, their capacity to extract and the number of results returned, you’ll realize that this is actually a very high amount of emails. You can adjust the default amount by using the slider bar. The maximum you can extract within a session is 500,000. Originally, the maximum was 1,000,000 emails. The only reason it was reduced was because lists were difficult to manage. They took forever to export and open up. The current default and maximum settings are in place for a reason.

Maximum Page Numbers Returned

This refers to the depth at which SEE will go in order to scrape pages. The higher the number, the more in-depth SEE will scrape to find emails. However, this will in turn cause more duplicates. For users that want a tightly targeted list then a maximum page number of 200 should be sufficient. For users that want a broader list of every email possible relating to a niche at the sacrifice of duplicates then a maximum page number of 1000 should be used.

Remove Domain Name in Search Results

This function does exactly what it says. The domain name refers to the website selection. If for example, you chose facebook.com as one of your websites then by default this function will not allow for email address such as jane@facebook.com. You may, however, wish to include the domain name where you have a list of Custom URL’s and want to scrape or otherwise gather email addresses from those specific domains – that contain the domain name within the email address. This would work in conjunction with the General Interest Search Criteria list and the PSF function.

Difference between Optimization and SRO Modes
The very original version of SEE displayed email results similar to that of the Status Results Orientated Mode. It literally sent the emails to the page as they were being extracted so that users could visually see the results. Because email lists were getting so large (in excess to 100k) this had an impact on some users’ computer memory. I decided to create a more ‘relaxed’ status of the email extraction process. The Optimization mode was created as a result. Both modes work identical with respect to the extraction process; however, because SRO returns more information to the user then technically it may take a bit longer. Also SRO mode might become unstable with extraction lists exceeding 100k.

Optimization Mode: Use for optimal computer performance and lists exceeding 100k. This mode tells you the server status and server response times.

SRO Mode: Use to visually see your results as you extract. A brief list of every single email extracted will be seen to the right during the extraction process. Use for lists up to 100k. SRO is short for 'Status Results-Orientated' Mode. It was how the original version of the program displayed emails.