Skip to the content.

Program’s execution process

FullText files downloading

During the program’s execution, the following steps take place:

1) The given command-line-arguments are validated and the related variables are set. 2) The input-json-file is parsed and the id-url pairs are loaded in batches. 3) Each id-url pair is added to the “callableTasksList” and then is processed by one of the running threads. 4) For each id-url, the url is checked if it matches to some problematic url (which does not give the landing-page of the publication nor the full-text (or dataset) itself). 5) Each url gets ready to be connected. If it belongs to a “special-case” domain then a dedicated handler is applied and transforms it either to a full-text url or to a “workable” landing-page. 6) If the url gives a full-text (or dataset) file directly (even after redirections), the file will be saved immediately (if such option is chosen) and the results will be queued to be written to the disk in due time. 7) If the url leads to a web-page, the following steps take place:

HTML files downloading

In case the program receives the “-downloadJustHtmlFiles” argument, it proceeds with downloading the HTML files, ignoring all other cases.
The execution is the same for steps 1 to 5.
Then, for step-6: If the url is a fulltext one, then it simply gets logged, not downloaded.
On the contrary, if it is a landing-page, then the HTML is downloaded and stored. No meta-checks, nor internal-links-extraction takes place.
Then steps 8 and 9 of the fulltext-plan take place.