WebSite Extractor Home- Internet Software

.
...
Click here Download Website Extractor (640 kB).

Website Extractor Documentation (*.rtf).

Website Extractor is shareware.

.
The Website Extractor

Content

  • Introduction
  • Getting started
  • General setting
  • Creating a New Project

  • How can you download a whole website or any part of it onto a disk?

    As most people, you have probably experienced this problem at one time or another. The Internet Explorer or Netscape Navigator were conceived for this purpose, to help you copy one page at a time. But if the site consists of 1000 pages, you'd have to click your mouse 1000 times and choose a directory 1000 times when you save the file. Now another option is available: using the new version of the Website Extractor program. All you have to do is enter the address of the website without having to worry about downloading it. Then you just wait for a short time until the program copies all or part of the website you have requested.

    The Website Extractor program is conveniently designed to download Internet websites exactly the way you want them, including or excluding any parts you need or don't need (such as directory, domain and file names, types of files, their size or any other properties).

    The Extractor can download up to 100 files at a time, which saves you a huge amount of time compared to ordinary browsers. All data retrieved are stored in the directory you select and contain only the files and directories matching your filter instructions.

    A broad range of customized settings for downloading web pages will enable you to limit the scope of files retrieved to such types as jpg or html files.

    Website Extractor automatically allows you to download any files that were not copied due to transfer errors or bad connections. The program is equipped to run through a proxy server and download only revised or new files, bypassing documents that have already been copied.

    The Extractor is essentially a search robot and is designed for fast-track navigation through the hyperlinks of cyber space, downloading web pages at the user's request. It offers numerous settings and options to facilitate this task. You can also limit your search by domain types (such as com, net, uk, etc.) by using sophisticated filtering options based on a list of key words and other auxiliary options. 

    To copy the Website Extractor program go to:
    http://www.asona.org/

    How the program works
    Main menu
    After downloading the program the main menu will appear on your screen.

    The main menu consists of the following options:

    • New - to start a new search
    • Open - to open a search in progress
    • Reopen - to quickly open one of the eight searches recently initiated
    • Save - to save a search project that has been initiated
    • Save as - to save a search project under a new name
    • Default options - project options that run by default
    • Exit - to exit the program
    General settings

    Before running the program it is advisable to adjust the general settings.
    To do this, launch the program and choose Default Options.

  • Download files
  • Follow new links / URL
  • Copy subdirectory structure from website
  • Extract local link
  • Stay within initial domain list.
  • Links level limit
  • Number of connections
  • Save results automatically
  • Time out for one connection
  • Number of retries
  • Swap URL count
  • Does not visit twice already scanned site
  • Apply domainname.com = www.domainname.com
  • Expand the nodes parents to make the node visible
  • Identify browser as
  • File Type Filter: Limiting the types and sizes of files
  • URL / Domain Filter
  • Domains: Limitations by domain type
  • The first thing to do is decide which directory (new path) you will use to save project files and the path to the directory for saving files copied (downloaded) from the Internet.

    Download files - this option is used to download files onto your hard drive.
    Unless this option is highlighted the system will only download a list of scanned hyperlinks into a special file.

    Enter the proxy server properties (if you use a proxy server).

    Then choose any other options you would like to use in downloading and searching for hyperlinks.

    Let's take a look at the various options available.

    Follow new links / URL - to follow hyperlinks automatically - this option allows you to automatically extract other websites linked to the one you are scanning.

    Copy subdirectory structure from website - to copy the structure of a subdirectory from the website you wish to download. If this option is highlighted your hard drive will be able to create directories like the ones on the website you are downloading.

    Extract local link - to search for local hyperlinks. This option allows you to search for local links on the website you are scanning, i.e. links that refer to other documents on the website.

    Stay within initial domain list. - A very convenient option that allows you to extract (not download) hyperlinks (websites) not included in the original list of addresses. Here you should decide whether you need to download other websites referred to from one you are downloading. Using this option you will only download the files you order. In this case the sites linked to the one you are investigating will also be downloaded.

    For example, you only need to download a list of (URL) addresses
    http://www.Internet-soft.com/DEMO
    http://www.Esalesbiz.com/extra

    and you don't need to download other domains linked to the original list of domains (e.g. http://vista.ru).

    Links level limit - number of downloading levels - shows the number of steps involved in the hyperlinks.

    An example will help to illustrate this option. Let's assume there is a hyperlink from one site to another. There is a link from the second link to the third, etc. As you can see, a number of hyperlinks must be followed to get from one site to another. This option gives you the greatest possible number of hyperlink steps. Each step enables you to make some hyperlinks with a number of other websites. So if you have selected only one level, you will only be able to copy the websites (let's call them X1 websites) to which there is a link on the website you are downloading (scanning), and not the sites with hyperlinks from X1 websites.
    The following chart shows how the links level limit works.

    Number of connections - In this item you enter the number of simultaneous connections.

    As a rule 3 - 10 connections are made. The optimal number of connections will depend on the number of lines you have and the connection speed of your provider.

    Save results automatically - To save your results automatically every N of minutes.

    This option shows how frequently your interim search results are to be saved.

    Time out for one connection - This option gives the maximum amount of time in seconds during which each document (one connection) is downloaded.

    At the end of this time the program starts downloading the next document.

    Number of retries - The number of attempts made to download each document.

    This option shows the number of attempts to download the same file if the provider connection or website link is broken off. The program will make as many attempts to download as you specify.

    Swap URL count - The number of addresses added to the list of tasks (tree of downloadable addresses).

    Does not visit twice already scanned site - This option allows you not to scan the addresses which have already been searched previously.

    Apply domainname.com = www.domainname.com

    In some sites the hyperlinks to other sites contain no original www symbols and when the same documents are downloaded they may be inscribed twice in different directories. This option is designed to deal with this anomaly in Internet sites. If you highlight this option INTERNET-SOFT.COM and WWW.INTERNET-SOFT.COM will be treated as synonymous addresses. The address is automatically prefixed as www in this type of search.

    Expand the nodes parents to make the node visible - This convenience option is intended to graphically represent the tree of websites scanned.

    In this way the option shows the current branches of the site being downloaded and enables the program to graphically depict the locations where sites are downloaded.

    Identify browser as - This option shows how the program will be identified when the website is downloaded by a remote server.

    For example, when you download a page using Internet Explorer 5.0, the remote server performs this operations and writes the contents of the server as a protocol. The Extractor program does the same thing when you visit a website.
    We would like to draw your attention to the following:
    Since the worldwide web contains a huge number of pages great data processing power may be needed as well as a large amount of disk space on your computer to download links and websites. A few hours of work by the program may take up many gigabytes on your hard disk.

    File Type Filter: Limiting the types and sizes of files.
    You can use this option to specify the types of files you want to download and limit their size.
    This is important, for example, when you only want to download text documents without banners, pictures or archive files.
    In this case, check the option beside html, htm, txt and shtml, etc. files.
    You can use these menu options to limit the size of files to be downloaded. If you have selected "Load all file sizes", files of all sizes will be downloaded. Otherwise you will only get the sizes (specified in bytes) you have selected.

    URL / Domain Filter: Limitations by names of directories, domain names and files.
    You can make limitations by entering certain words in domains. Let's say you're downloading files only from www.yahoo.com. You would only enter yahoo as the filter word.

    The filter can be used separately:

      • to adjust the word content in a domain name;
      • to expand the domain;
      • to adjust the contents of a certain word in a directory name;
      • to modify any given word in the file name.
    The filter can be used to include and exclude. If you have entered words into the exclude filter, this means that if the URL contains any of these words, the corresponding files will not be downloaded. If you opt for the include filter, this means that only the names containing the properties specified in the word filter will be downloaded.

    Domains: Limitations by domain type.
    This option enables you to make limitations by type and country of the domain.

    To do this click on the requested domain type. This is all you have to do for the main program settings. When you exit the menu window you save by default the data you have entered and you can proceed to download websites and e-mail addresses.

    Now we can start a search project. The default properties you have entered will automatically be called up when you start a new project. These properties can be altered and saved for a later time for each separate project.

    The "search and download website" concept is at the heart of this system. The term "project" therefore refers to the total number of options that define which sites and properties are to be downloaded.

    WEBSITE DOWNLOADING

    To start a new project, press "New".
    An interface to define search and download (website) criteria will appear on your screen.


    Then enter in the left window the list of websites (URL) which you would like to download.  By pressing the Load button you can download the list from a text file.

    By pressing "Options" you enter the specific properties for your search project.  Here you should check your directories and make sure you have enough disk space to download the required websites.

    Search properties are entered the same way as default properties.

    After you have assigned the search parameters, close this window and save the project by pressing "save as".  Then give the project file any name you choose.

    For example, you could call the site yahoo.pro.

    Then proceed to download the data by pressing the "Download" button.

    The properties that are most often changed are "download files" and "number of connections".  These properties are conveniently located at the top of the search window toolbar to avoid having to exit this environment and enter data in another window.

    After downloading data we recommend pressing "save as" or "save".

    In this way you will be able to reuse these properties, if needed, at a later time.

    To access existing projects, press "open" and choose the name of the project.

    You can re-download websites by pressing the "download" button or by continuing the download from the last site you have visited.  To do this select the proper file on the "download tree" and press "Resume".

    The Extractor program is developed by Asona.

    .
    .
    Expert Columns
    New Software
     
  • Internet Search Office
  • MailList Express 
  • DB Maker
  • Domain Quester Pro 
  • FTP Navigator 
  •  

    Internet Search Office
    Includes at once 5 programs: 

  •  Newsgroup Explorer 
  •  Link Extractor 
  •  Webscape 
  •  Local URL&E.mail Questor 
  •  MailList Express 

  • http://www.esalesbiz.com/web/demo/internetsearchoffice.exe


    MailList Express  Once you have a targeted email list, MailList Express  lets you create e-mail message templates. You write one message that will be sent to all the people on your list,  Thus, it will be individually addressed with personal data merged into the message body. All transaction are saving in log file. You can set up the charset to send messages in html format.  This is an immensely efficient communication method, both with the existing and the potential clients respectively. Also program allow to determine which of addresses in mail list are unavailable. The program works on the same algorithm as smtp mail server. Mail server addresses for e-mail are extracted from DNS. MailList Express connects with  SMTP-servers and simulates the sending of message and disconnects as soon as the mail server informs whether or not this address is valid.
    http://www.internet-soft.com/DEMO/maillistexpress.exe


    DataBase Maker  allows you to extract specific data from HTML and TXT documents and create text databases. The program lets you analyze and search a selected document for specified data, and convert the unstructured results into a text file for further processing in a spreadsheet application like Microsoft Excel and  MailList Express. http://www.internet-soft.com/DEMO/dbmakersetup.exe

    Domain Quester Pro
    All the good domains are taken!?  This is an age-old complaint, made more plaintive by startups and .coms in the search of a good domain name. 
    Now there is Domain Quester, a program designed to search for available domain names, generate the greatest possible word combinations and then select the best possible combination for naming a business on the Net. It is as simple as entering a list of keywords on the Quester Web site at http://www.internet-soft.com/DEMO/questerprosetup.exe


    FTP Navigator  is a Windows-based Internet application that facilitates FTP transfer by displaying information about the files and directory structure of a remote system in a browsing screen. This allows you to utilize the capabilities of FTP without having to know all the details.  Its side-by-side view allows you to use it to apply simple file management both locally and remotely.  FTP Navigator gives Internet users the ability to quickly upload, download,  delete and rename files; to create and to delete directories on an FTP - server.    The FTP Navigator application can connect to nearly any FTP Site, whether it is a Windows NT, VMS or UNIX site. This excellent FTP program has a lot more than just a pretty interface. In addition to previous niceties FTP Navigator provides advanced FTP features such as resume uploads and downloads, multiple files transfer and custom commands
    http://www.internet-soft.com/DEMO/ftpnavigator.exe
    .
     
    Related Resources
    Internet Related Software
     
    Internet-Soft.Com
    To contact us: extra@esalesbiz.com

     Copyright, Asona.  © 2000 - 2002