Use Automator To Combine Text Files In Linux
Somebody should make a FAQ for this question, I seem to spend half my life answering it.Regarding multiple HTML files:1) If your collection of HTML files includes a file that has links to all the other files:- Download the beta version of calibre and simply add that file to the GUI.The GUI will automatically create a zip file with all referenced HTML files and the ZIP file will also contain an OPF file that references the HTML files. You can then use 'save to disk' to save the zip file to somewhere on your disk. Unzip it and use the opf+html files in whatever program you need to.2) If your collection does not have such a file, it takes two minutes to create one. It will have the form. I have been experimenting with multiple HTML files and ran into problems. It took me a while to figure it out -and it may still be a fault on my part!Using SoftSnow as recommended above, I merged my 70 odd html files but could not get the ePub to work correctly with Calibre.Then I noticed that SoftSnow had left the at the end of every segment, whereas the merged file should have only one instance, at the end of the file.
Removing the extras has cured my ePub problem.I hope that this helps others. I know this is an old post, but I have the same problem so I'm bumping it to hope someone has figured this out by now. I have 243 html pages that I'm trying to combine into a single kindle document.
I used the 'DownThemAll!' Firefox extension to download the html files to my computer, and 'Automator' to rename all files numerically with '.html' extensions.
I then created a single html file called index.html according to the instructions above which lists all 243 html files as links. When I open index.html in firefox, it works fine - I can click on any one of the links and it will bring up that page. However, when I add index.html to calibre (version 0.6.33) it only adds the single file without the linked files. If I zip all files together and add to calibre, the result looks the same (although strangely the first way resulted in a 0.0mb document and the 2nd way resulted in a 1.6mb document).Any help would be appreciated.
Here is part of a walk-through I posted at a fanfic site I use all the time. I've gotten many responses that it was easy to follow and better yet, it works! Feel free to PM me if you have any questions.-In order to convert a collection of HTML files in a specific order, you have to create a table of contents file.
Beat all the races in the series. You start off with one race unlocked. Own all houses on the Island. Race and get gold in all these races and you will earn all the houses. Recieve Free Cars. Teleport to any road in Oahu. Unlockable Cars. If you've discovered a cheat you'd like to add to the page, or have a correction, please click EDIT and add it. Quick Money Tip. Ford Shelby Cobra Concept. Lamboghini Countach 25th Anniversary. Aston Martin DB4 GT Zagato. Chevrolet Corvette Stingray 69. All contents for Test Drive Unlimited on PSP. Find more content on: Xbox 360| PlayStation 2| PC. Test Drive Unlimited Cheats. Easy Money Own ALL Houses On The Island Teleport To Any Road On Oahu Unlock Cars Unlock Chrysler ME Four Twelve More cheats. Test Drive Unlimited Hints and Tips. Complete Car. Find all our Test Drive Unlimited Cheats for PSP. Plus great forums, game help and a special question and answer system.
That is, an HTML file that contains links to all the other files in the desired order. Such a file looks like:Table of ContentsPart OnePart TwoPart ThreePart FourPart Five2. Copy the text above and paste it into Notepad. Save this file as book.txt in the same folder as the downloaded HTML files. Keep this file open for now.3. Open the folder where the HTML files were saved and locate the files.
Copy the first file name, (e.g.,newbook1.html). Go back to the Notpad file file you just created and replace the first line of HTML.EXAMPLE:You are changing this Part OneTo (for example) Part OneAs you can see, only part of the file name needs to change since the rest of the code remains the same. It saves a lot of typing if you just change the file name up to the page number, e.g., copy newbook and paste it before the pages number on each line - samplepage1.html, samplepage2.html, etc. Obviously, you have to be sure the naming convention is consistent for each page.In case this confuses you, here is what I would use to create a single book from As You Wish (10 HTML pages long)Table of ContentsPart 1Part 2Part 3Part 4Part 5Part 6Part 7Part 8Part 9Part 103. Once these changes are completed, save the Notepad file with the name of the story into the SAME FOLDER as the downloaded files.
Locate this saved file and change the.TXT extension to.HTML. You will get a pop-up warning that changing the extension may make the file unusable. Answer YES you want to change it.Now, you’ll want to check the file to confirm the formatting is correct. (If you followed the steps above, it should work!)4.
Double click the newly created HTML file and it will open in a Web page. You should see something line this -Table of ContentsPart 1Part 2Part 3etc.Only “Table of Contents” should be plain text and Part 1, Part 2, etc.
Will be hyperlinks. If the hyperlinks don't work, check and edit your formatting.
Txtcollector
Just change the.HTML back to.TXT and open the file again. Check the specific line of code that looks suspect for any errors.
It can be as simple as the first HTML page was named 01 and you forgot to add the 0 (samplepage01.html VS samplepage1.html)Once you think the file is ready, remember to have it in HTML format before moving it to Calibre.5. Open Calibre and connect your reading device to your PC via a USB cable. It will be recognized by Calibre and will become visible in the Library window.6.
Add the HTML file (drag or use the Add button). When the file is added to Calibre, it is converted into a zipped file.You can now convert the file to the format of your choice. That doesn't really solve my problem. I can create a TOC file no problem - I've done that part already.
But, when I try to add it to Calibre, it brings in only that TOC file and no referenced files (despite all being in the same folder). When I opened the TOC file in firefox, it was working fine.I'm using Calibre for Mac, so perhaps it's different on the Windows version?edit: this message was intended for Katelyn. Slm, I'll give vhtmlmerger a try. I'm guessing by the exe extension it's a windows program though? Ruddell, I could take a look at some of the file and try to combine to determine what the issue might be but WOW the iterati program is very cool. It should be what you want but I'd be concerned about the size of the file with 243 pages.
You may want to break that up into a number of 'books' and create a series. Let me know if iterati doesn't do it for you and we can look into further the issue you're having unless Kovid knows what might cause that problem. It can be the simplest thing causing it. That doesn't really solve my problem. I can create a TOC file no problem - I've done that part already. But, when I try to add it to Calibre, it brings in only that TOC file and no referenced files (despite all being in the same folder).
When I opened the TOC file in firefox, it was working fine.I'm using Calibre for Mac, so perhaps it's different on the Windows version?edit: this message was intended for Katelyn. Slm, I'll give vhtmlmerger a try. I'm guessing by the exe extension it's a windows program though?zip the toc and the reference files altogether and give that to Calibre.Dale. Also for merging webpages on the fly as you surf around the web in your Firefox browser, and exporting them as a single HTML file that you can copy to your reader later on, try the ScrapBook Firefox plugin:For advanced users of Linux (some command line interaction needed), I used the HarvestMan web crawler to get all the html pages for a free-online ebook that I wanted to read, then I used HTMLDOC to make a nice pdf from the html. The HTMLDOC is so good, that it created a pdf the table of contents from the master html page, which worked perfectly in my e-reader.