
rewritten webarchiver plugin
Source (link to git-repo or to original if based on someone elses unmodified work):
--- 2008-02-25
* Version r3/3.5.8 to r4/3.5.9
* Fixed failed assertion if KHTML does not parse a STYLE area
* use "Verify" cache strategy if working together with the original http slave
--- 2007-11-20
* Version r2/3.5.7 to r3/3.5.8
* fixed crashes on webpages that somehow create internal DOM nodes without children.
* fixed handling of webpages that have more than one style area.
* very small performance and error handling optimizations.
-- 2007-06-07
* Version r2/3.5.6 to r2/3.5.7
* no changes in the code itself
-- 2007-03-21
* new patches against KDE 3.5.6.
* Much stricter and more secure URL checking
* several bug fixes
* last directory is now remembered in save dialog
-- 2006-20-06
* new patches against KDE 3.5.3. No changes in the code itself
--
* new patches against KDE 3.5.2. No changes in the code itself
* updated README
* webarchiver sources packaged as patches, not as tar archives
Ratings & Comments
17 Comments
I'd love if the original url of the file (the web url) can be saved in the .war file ;maybe as a comment of the tar.gz archive or something (don't know if this kind of archive can embed comment) and be accessible through the properties of the file. Maybe a personal comment could be useful to (a note about this page...) It's also annoying to click on a button to close the dialog when the save is finished I really hope your improvements will be part of KDE soon :)
It is already saved although a bit hidden. Open the archived page in Konqueror and press Ctrl-U (or select View->Show Source from the Menu). The original URL is saved inside a HTML comment at the top. About Meta-information: Sounds interesting, but I have not yet looked at how it is handled by KDE.
I think it would be a good idea to name the extension kwar, webar or something like that. .war is used in JSP Servers and Java Application Servers like Tomcat, JBoss and Weblogic. At least for me this is not optimal because also the mime actions in kde are wrong. (This wars are zip files) Felix
If .war gets renamed then I guess all users with their existing .war file collection would start revolting. I don't know if there is way to tell KDE that the same file extension refers to two different file types.
I'm guessing you haven't explored the properties bit on Konqueror much then. Right click a file and select Properties, on the right from type should be a tool icon. Clicking that lets you alter a file's mime type including adding extra extentions to its list. Its also accessible via Konqueror's settings and KControl.
You are talking about other the way round: A certain file type (for example a Word Document) has one or more file extensions (*.doc; *.DOC) The problem here is that two different file types (KDE web archive, Java archives) share the same extension (*.war). But your post made me look at it again. In the mime type property dialog it is possible to add another application to a filetype and give it higher priority over the default one. In this case it means adding the Java application server to the list of applications that handles files of type web archive. A drawback is that KDE will always default-open .war files with the Java server regardless of whether the file is a web or Java archive.
FireFox has maff and Konqi has war. However during testing these I find war to be easier quick and responsive than maf. It would be great if FireFox could have been made to understand war by writing a plugin for it following the Maf plugin code. What do you say?
* I rarely use firefox ;-) * good idea, but no time, sorry.
hi, maybe someone know if the WAR format is a KDE only format or if it is used by others DE too (like Gnome) I think it's KDE only no ? isn't there any unified format for linux ? (and many thanks for this improvements, really)
It is a zip file. Just unzip it and open with firefox if you please!
.war is plain .tar.gz format in disguise, you can extract it with e.g. $ mkdir webpage && cd webpage $ tar -xzf ../webpage-archive.war $ <open index.html with your browser>
One thing I find really annoying is webpages that have a shrunken picture and when the picture is clicked on it uses java magic to popup a window with a larger picture in it. This isn't saved by the web archiver, is there any way to get support for this?
There are two problems: 1) Images or other things loaded by Javascript, Java or plugins can change unpredictably everytime the webpage is viewed. For example, an embedded java script may load a different image on the first of each month. Therefore, the webarchiver is not able to know beforehand what may lurk inside a java script block. 2) The design of the new (and AFAIK old) webarchiver is to be able to make a snapshot of the current webpage only (as far as it is possible). What are you looking for is a tool that also downloads hyperlinked pages and images. It is possible but a time-consuming task to add that to the webarchiver and, frankly, I don't want to add such bloat, because there are already tools out there that do this job. I suggest you use 'wget' or a similiar program that can download webpages recursively.
File a wishlist item at bugs.kde.org and attach your patch to it, this will make it more visible to the maintainers of the code you improved!
I did so a few times, but were constantly hitting a KDE feature freeze. Someone pointed me to kde-apps.org to get it out so people can test it. In the time between I was too busy / lazy doing other things (working for example :-) But anyway, this is good idea. http://bugs.kde.org/show_bug.cgi?id=98695 http://bugs.kde.org/show_bug.cgi?id=118475
This is a much-needed improvement! Please keep up your work with this!
I have been using this patches privately for about a year because I really wanted them. So chances are good I will support them in future KDE versions :-)