Nikse.dk
Back

Subtitle Edit 3.2 RC1

Posted by nikse.dk 2011-08-28 Viewing 61 comments

Subtitle Edit 3.2 is now out in RC1!
The release contains mostly fixes - especially for history/undo/waveform + edit of original subtitle + yet more subtitle formats (SE now supports 60+ formats!).

Please report any bugs/crashes!

Languages included: Chinese, Czech, Danish, English, Polish, and Swedish - thx to the translators :)
Languages in need of update: Bulgarian, Spanish, Basque, French, Hungarian, Italian, Japanese, Romanian and Serbian.

posted by aMvEL at 11-30-08 7:36 PM
Removing HI still removes italic-tag if it's inside italic tag:

NARRATOR: In the criminal

will remove the starting italic-tag if you remove "NARRATOR:"

Closely related to the fact that if you remove [ ] and : at the same time, when inside eachother

[MUSIC: randommusic] -] randommusic]



Other than that no bugs encountered

posted by Morten at 11-30-08 8:15 PM
I ocr via image compare kan den slet ikke genkende I/l/!. Kan dette forbedres?

eks.
Skal jeg tænde radloen?
Kørl


Genkender O som G..
"TØNSBERG, NGRGE
965 e.Kr."


Er det muligt at bruge ocr engine fra Subrip, da den klart er den bedste manuelle der findes?

posted by nikse.dk at 11-30-08 8:53 PM
@aMvEL: Thx, the examples helped find the bug, should be fixed here (exe only).

@Morten: What type ocr do you use? Tesseract or manual (image compare)?
The subrip code is *very* hard to read!
In my test cases SE 3.2 ocr via image compare is better than subrip... could you send me your test file?

posted by Morten at 11-30-08 9:42 PM
Image compare is manual, or not?

yeah i know. Many have tried using it, but unsuccesfull.
But subrip is still the best for dvds in many cases. So their engine must be quite efficient. Unfortunately very out of data application :(

Anyways tried ripping .sup this time and had the problems explained above.

here's the test file http://gupl.dk/64721/

I disabled any sort of autocorrection as i wanted to test how manual ocr worked.

posted by nikse.dk at 11-30-08 10:59 PM
@Morten: Yes, image compare is manual.

Thx for the sub!
This is a Blu-ray subtitle file with a higher resolution than normal dvd subtitles. Image compare does not work well here (can subrip read this file?).

Do try Tesseract (be sure to have both Danish tesseract data files + Danish Hunspell dictionary) - it seems to do a fine job with this file. Tesseract does not add italics, but you can right click in the listview in the ocr window and click on "Save all images with html index". This will show all subtitle images (in a html page) and should make it relatively easy to add italics where the original subtitle had italics.

posted by Morten at 11-31-08 9:12 AM
@nikse. Yes, that is the problem with bluray subtitles. HR. Ofc, you could use suptoidx conversion to make it dvd compliant, but then subrip ask every single letter becuase of HR.
No, for .sup you have to use Suprip. Notice "p". http://exar.ch/suprip/ Unfortunately this has bugs as well, but works very well except from that. Italics is a problem in some cases for it though.

I tried Tesseract and found a proper Danish data file. It worked ok, but no italics as you say. But i would rather do manual as that normally is better.

posted by TC at 11-02-09 5:04 AM
Hi nik!
Awesome job on the 2 things i pointed before ('Chars/sec' and 'Total length' counts)! Everything looks great now :)

I just want to make a suggestion: how about a customizable 'Chars/sec'? I mean, the currently settings is to highlight 25+ chars/sec (in orange) and 32+ chars/sec (in red).
I work under 21 chars/sec, for instance, and many people certainly use other standards.
How about if it was possible to customize that?

Regards

posted by nikse.dk at 11-02-09 6:58 AM
@TC: Yes, I should probably add char/sec in options in version 3.3. Right now you can change it in Settings.xml.

posted by vmb at 11-02-09 7:29 AM
Hi Nikse! Any chance of inner and outer drag-and-drop in the nearest builds?

posted by Jan H at 11-03-09 8:52 PM
How can i set the options so the subtitles will be aligned in the middle? (horizontally and also vertically)

posted by aMvEL at 11-04-09 1:44 PM
So the [Music: title] bug is now fixed, thank you..

However the removal of italics bug is not fixed.

[i]GEMMA: Hey, baby.[/i]

becomes

Hey, baby.[/i]

posted by aMvEL at 11-05-09 2:26 PM
I also stumbled upon this, don't know if it's intended or what..

http://i53.tinypic.com/107nw53.jpg

posted by nikse.dk at 11-06-09 3:50 PM
@vmb: I'm not really sure what you mean about inner and out drag-n-drop - and SE 3.2 should be out soon, so no new features for now.

@Jan H: subtitle position should be available in your video player (or subtitle filter).

@aMvEL: Hopefully fixed now.

posted by aMvEL at 11-06-09 8:53 PM
You've almost got it now, nikse :P

I only noticed that if you have a line-break like this:

[i]NARRATOR:
Previously on NCIS[/i]

it will return

Previously on NCIS[/i]

however if it's [i]NARRATOR: Previously on NCIS[/i], it works as intended.

posted by mboy at 11-06-09 9:16 PM
Something is really off with the timings on the subs when using "Import/OCR subtitle from VOB/Ifo (dvd)".

If i demux the sub/idx files with MeGUI's vobsubber i get the correct timings and can then OCR via SubtitleEdit..

SubtitleEdit timings for first line:
00:00:12,600 -- 00:00:17,276 | correct should be: 00:00:12,612 -- 00:00:17,288

Subtitleedit timings for last line:
01:41:08,434 -- 01:41:12,075 | correct is: 01:40:43,771 --] 01:40:49,084

I doublechecked that the fps was set to 23.976 in SubtitleEdit.

Hopefully you could fix this as using vobsubber first and then OCR'ing with SubtitleEdit makes the subs harder to read for tesseract somehow...

Email me if you need more info or f eks the IFO file...


posted by vmb at 11-07-09 8:10 AM
Hi nikse.

I've tried to explain in the forum (http://www.sub-talk.net/topic/107-subtitle-edit-open-source-subtitle-editor/page__st__180__p__11162#entry11162) and in the comment in your previous post:

If you add a drag-and-drop functionality, please, add as an inner drag-and-drop (inside text fields, for word and clause permutation), as an outer one (between SE text fields and other programs, such as digital dictionaries and web browsers). It will be much simpler to edit and translate with drag-and-drop.

What a pity. Well, may be in next alphas/betas?

posted by Jo at 11-07-09 11:09 AM
Hi
Thanks for your program. I love to use your function on auto-translate. I had no problem so far but the last 2 weeks i have a huge problem. I thought it had to do with 3.2RC1, but it remains with 3.1 too. I also checked it in 2 Pc. When i press the button for translate (powered by google), it takes a lot of time, like centuries to finish translate, and when is done, is a mess. In the same sentence i can see 2 languages mixed, bad translate, or a miss sentence. I dont know what's wrong. I hope you can help me.
I also like to ask you if you could make your program not to accpet only VLC for synchro. Not all of us use this VLC. I use Media Player Classic and i dont want to download mix programs.
Thanks.

posted by nikse.dk at 11-07-09 6:45 PM
@aMvEL: Hopefully fixed...

@mboy: It works for my PAL dvds, but I've not tested with NTSC. Is the sub in sync if you do a Sync -> Change framerate (25->23.976)?

@vmb: Yes, I can try after 3.2 final (otherwise there will never be a stable relase;)

@Jo: Hm, it works here. From what language to what language? Also, Google only lets you translate a certain number of lines every x minutes!
What do you mean by "not to accpet only VLC for synchro"?

posted by aMvEL at 11-07-09 7:27 PM
Even closer ... Now it retains the italic-tag in all cases except if there's a colon in the second line. Hopefully this is the last scenario, as I've tested it thoroughly

[i]NARRATOR:
Previously[/i] <- This works

[i]NARRATOR:
Previously:[/i] <- This does not work

The placement of the colon doesn't matter as long as it's after the line-break)

posted by nikse.dk at 11-07-09 8:36 PM
@aMvEL: Yet another fix...

posted by vmb at 11-07-09 8:56 PM
Thank you, nikse. Good luck with final.

posted by vmb at 11-07-09 9:22 PM
Sorry, nikse, just a small bug with a translate mode:

The original is shown on the wave. I adjust the time of a subtitle on the waveform and save the both files. However in the list view and time fields near edit field the old time is preserved. If I select the subtitle on the wave, the list view is updated, if I deselect the subtitle, the list view shows the old time again. I need restart SE to update the list view and time fields.

If the translate is shown on the wave, there is no bug like this.

posted by aMvEL at 11-07-09 9:51 PM
Seems to be no change in the latest fix, nikse..

posted by nikse.dk at 11-07-09 9:58 PM
@aMvEL: Could you provide an example?

posted by aMvEL at 11-07-09 10:28 PM
Well, I've tried it with the example I provided in my previous post...

You can see it at line 1 here:
http://www.mediafire.com/?s2rioab6jm5f7q5

posted by nikse.dk at 11-08-09 9:55 AM
@vmb: Nice catch :)
Editing waveform with original text should be fixed now.

posted by vmb at 11-08-09 10:25 AM
@nikse: Yeah, everything is OK now. Thank you.

posted by mboy at 11-08-09 3:32 PM
Testet sync -] Convert framerate, did not solve the problem... But i located the "bug", selecting NTSC when ripping the sub file makes incorrect timings, using PAL makes correct timings even if the DVD is NTSC..

Image explains better: http://imageshack.us/photo/my-images/402/subleeditbug.png/

posted by nikse.dk at 11-08-09 4:11 PM
@mboy: Thx for the info, I've never tested with NTSC.
Updated version here which uses same calculation for both PAL and NTSC (PresentationTimeStamp/90000=secs).

posted by mboy at 11-08-09 11:24 PM
Thanks for the update, that seems to have fixed the issue :)

posted by vmb at 11-09-09 11:54 AM
Another bug in the translate mode: after "Insert new subtitle at video pos" all the original subtitle lines in the list view are mysteriously replaced with empty ones. If I select a line, its text appears. Anyway the original edit field of the new subtitle is unavailable, only translate field is active.

posted by nikse.dk at 11-09-09 1:56 PM
Thx vmb - SE updated :)

posted by vmb at 11-09-09 2:47 PM
Fixed. Thx Nikse :)

posted by Buzonyx at 11-09-09 10:36 PM
Is there a way to remove the toolbar or make the icons smaller?

posted by nikse.dk at 11-10-09 8:36 AM
@Buzonyx: In this updated SE version you can remove the toolbar by hiding all buttons in "Options -> settings : Toolbar" :)

posted by vmb at 11-10-09 5:45 PM
Hi, Nikse.

Just a little bit about translate mode:)

If I open SE and it restores previously opened original and translation, the edit fields are small, shifted to the left side and its info is misplaced (all these issues happen in undocked mode). I need reopen files via menu to stretch the fields and to place info correctly.

Tell me if you need some screenshots.

posted by nikse.dk at 11-10-09 7:06 PM
Thx vmb - SE updated again :)

posted by vmb at 11-10-09 9:23 PM
Fixed again. Thx again :)

posted by Aske Vejby at 11-12-09 6:32 PM
Hey cousin - i finally got to try this program, it works smooth and perfectly, good job =)

posted by Andrew at 11-12-09 9:44 PM
Bug with trying to write the settings.xml under Windows 7 (and propably Windows vista) stil not fixed. still need to use taskmanager to close the application. (this all occures when installed under 'C:\program files (x86)' or 'C:\Program files'..

posted by nikse.dk at 11-13-09 11:16 AM
@Aske: Hi there :)
Thx, I hope you can use it!

@Andrew: That's because you are using a "Portable version" which keeps all settings in the SE folder/subfolders. This version should NOT be installed to "Program files" but rather "C:\Download\Tools\SE" or similar. I should probably mention this somewhere...
SE 3.2 final should be out soon - both with installer and portable zip.

posted by aMvEL at 11-13-09 2:09 PM
I seem to be having some issues when trying to rip a subtitle from DVD. When trying to use Image compare, it seems to think the lines are connected, so that it takes both the letter in the first line and the letter that's right under it on the second line.

Example file: Here


Note: Also, if I try to rip this particular subtitle directly from the IFO/VOB the languages are incorrectly named, possibly because there are two of all languages. This however is not a problem if I rip it to idx/sub before i feed it to SubtitleEdit, which then reports the languages properly.

posted by Gman524 at 11-27-09 10:13 PM
Having a crash when using the Auto-translate (powered by Microsoft). Was not having any problems before then suddenly it began giving me issues...

See Error - http://tiny.cc/subtitle-error





posted by nikse.dk at 11-28-09 6:44 AM
@Gman524: Thx, MS changed their interface a bit, will be working soon...

posted by aMvEL at 11-29-09 5:21 PM
Stumbled upon something..

[i]In Moscow, several politicians,[/i]
[i]police officers and businessmen -[/i]

If you have "Fix '--' --> '...'" selected, it removes the trailing italic-tag, invalidating it like this:

[i]In Moscow, several politicians,[/i]
[i]police officers and businessmen...

posted by nikse.dk at 11-29-09 8:21 PM
@aMvEL: The line connection stuff I'll look at later - probably not easy though.
The italic issue should be fixed here:

posted by aMvEL at 11-01-10 9:51 AM
Yeah I figure that line-connect thing is hard to fix, and that specific subtitle was hard to rip with any of the ocr-programs...

Good job with the italics fix, working as intended :)

posted by mboy at 11-01-10 6:40 PM
Feature request/bug fix:

I have a folder of .jpeg images with subtitles, each line is in it's own .jpeg file, i "import" these to SE using a Sony BDN XML file, but i have a problem when a subtitle is spanned over multiple lines. SE does not seem to support more than one [graphic]file.jpeg[/graphic] pr [event].

So when i specify:
[Event Forced="False" InTC="00:03:39:03" OutTC="00:03:41:02"]
[Graphic]line0001.png[/Graphic]
[Graphic]line0001_2.png[/Graphic]
[/Event]

Only the first [graphic] is included on the OCR process. Could you possibly add this functionality or fix it if it's a bug?

posted by nikse.dk at 11-01-10 7:28 PM
@mboy: Haven't seen that before - try this version...

posted by sialivi at 11-01-10 9:22 PM
@nikse.dk

What's the intended workflow for resuming Importing/OCR:ing after having been forced to stop? Is it to create a new file for each import session and later merge them?

I know it's a bit late for feature requests, but the behavior I was expecting when trying to resume was that any subtitles already loaded in the editor were brought over to the Import/OCR dialog.

Also, I think I found a small bug: Trying to append a .sub to a .srt seem to make the application unresponsive.

posted by mboy at 11-01-10 10:24 PM
Thanks.. Works about half-way :P

Auto-transparent background does not take effect, and adding more than 2 lines only puts [br] on the last one..

Current behavior:
line1 line2
line3

but should be:
line1
line2
line3

Using just 2 lines works fine, it then becomes:
line1
line2

posted by nikse.dk at 11-02-10 9:48 AM
@sialivi: To continue with an ocr, open the image based subtitle, and in the ocr windows you can right-click in the list view and choose "Import text with matching time codes".
Trying to append a .sub to a .srt should not make SE unresponsive now.

@mboy: I've tried to fix auto-transparency (auto-transparency will never be 100% correct though). Does the line break issue still appear?

Latest update

posted by mboy at 11-02-10 2:28 PM
the auto-transparency now takes effect, but using more than 2 lines does still not appear correctly with [/br]

I upped a sample here which you can test with:
http://www.mediafire.com/?3cztc1zuab8rscx

posted by nikse.dk at 11-02-10 2:36 PM
I get this result:


Is it not correct?

posted by mboy at 11-02-10 4:27 PM
That's correct in the display, but press the "Start OCR" button and the results of the OCR process will be wrong...

Never had any problem with how the images are displayed, the problem is the results of the OCR process...

posted by nikse.dk at 11-02-10 5:32 PM
@mboy: He, what if you un-check"Auto break paragraph if more than two lines"?

posted by mboy at 11-02-10 10:29 PM
Indeed, that is the solution :)

My fault for not noticing it...
It now works as i wanted.

Many thanks :)

BTW, have you planned any way of training tesseract inside of SubTitleEdit?

All the tools provided here: http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 are kind of buggy and way to time consuming...

posted by sialivi at 11-03-10 3:07 AM
@nikse.dk

The import/OCR dialog is a bit unresponsive when it's busy processing, which sounds like it's running on the UI thread. Any chance of doing it on a separate worker thread?

posted by nikse.dk at 11-05-10 2:00 PM
@mboy: No, I've not planned any training of tesseract, but I can see it would be a nice feature.

@sialivi: Yes correct, I'll put it on my todo list (for next version after 3.2).

posted by Tony at 11-07-10 10:42 PM
Hello

Thanks for your beuatiful program! I use this/ your program since you had made it and never get problem when each time I translate from one language to another one by google auto translate engine (not microsoft because it never worked, but not big problem....)


But my intention is that recently I got times after times bowring problem when I translate one choosen language to the another one. That is in such way first of all, some lines will not be translate at all ! and secondly, the orginal text which stands to the left margin, easly will cause interfrering with translating language of choice when I am going to use that choosen language in any PC or home mediaplayer!!!!!. What is the problem, sir!

posted by nikse.dk at 11-07-10 10:45 PM
The problem is that both Google and Microsoft are closing their free translation APIs :(

But this version should still work with Google translation: http://www.nikse.dk/SubtitleEdit.zip (MS has added a rather small limit)

Name *
E-mail Optional
Comment *
Do you want to delete the comment?

Comment



Back   |   Login   |   New user