Jon !

Blog personnel de Jon

And ocrimport is dead.

avec 4 commentaires

I don’t want to go into the details by now, as I am still communicating with the google staff in order to get a compensation for the time spent. To be quick, my google summer of code mentor thought I wasn’t worth to work with by shutting down my projet 2 weeks ago.

Anyway his decision was (at least by my point of view) delirious, either from a mentor side or a supervisor side.

I will post a final article about this once I really know if I will be pay for my time spend during may, june and early july.

If you plan to evolve on this subject, OCR capabilities for open-office or extensions for open-office, feel free to contact me, I am sure I can help.

Rédigé par Jonathan

juillet 31, 2009 à 9:10

Publié dans en

How OcrImport is going ?

avec 7 commentaires

I am sorry for the actual frequency of my blog post on the subject, it just means I was pretty busy. My last post related to the subject was a month ago after spending a lot of time in openSUSE in order to get this working, since I run into several problems related to the environment for running and testing the extension … I will maybe post more especially about those issues !

But finally, an“early alpha version” pops out, I made yesterday a video about it.

What are the actual features implemented ?
The core feature of the extension (OcrImport is an extension then you won’t need a specific version of OpenOffice to run it, just a build for your platform), is here, you are in a writer document and you can insert some text coming from a picture. The recognition is good, but not my work, the extension is using Ocropus for the OCR stuffs, then thanks to ocropus there is some layout recognition like font size, font weight, text disposition (paragraph).

A additional feature is also implemented, you can import text from several pictures at the same time. This might be useful if you get a directory with 50 – 100 pictures representing documents pages, then you just select all of those files (ctrl-A in the file picker windows). There is not yet a progress bar, and the ocr take in average on my Virtual Machine 10s per pages, so 50 pages is a 10 minutes job.

What is on the schedule ?
The actual schedule is quite fuzzy, but there is three things that have to be done with a high priority level :

  • package ocropus into the extension, or at least an alternate OCR. From now, the ocropus used is the one installed on the system, then if you don’t have ocropus installed the extension  won’t run (It won’t make your copy of OO crash, so feel safe). On Linux, having ocropus with a package manager is possible (despite some packages manager not providing a working version of ocropus), but there is no such user friendly possibility for MacOS, then OCR binaries included in the extension are really importants in order to make the extension usable.
  • prepare a MS Windows version. There is more than a order of magnitude in difference of the amount of people you can reach depending of the platform. There is 10 times more user on Windows than on the rest of operating system in use, so it makes sense to prepare a Windows version, since Linux and MacOS version are on the right way by now.
  • work on User Interface. At least there is basic improvement to do, like a progress bar. I will ask some expert about it in the communities, I am looking for some good idea.

That’s all I have to say by now from a high level point of view. I will do some other post about technical thing as soon as possible (SDK vs. dmake, string manipulation, Xtransferable vs. XDispatcherHelper … ), stay tuned ;-) .

If you have question, or feature request, feel free to drop some comments !!

Rédigé par Jonathan

juillet 13, 2009 à 8:45

Publié dans en, go-oo

How to work in a Linux VM on a Mac (or Unix familly) ?

avec 3 commentaires

Last week, I spend most of my time on a openSUSE VM (I use parallels for information) in order to do some distro specific job on ooo-build (OpenOffice). It was not the best experience I had on Linux, keyboard layout sucks, especially on copy/paste (CMD-C CMD-V do not have relevant replacement solution on Linux) and also the copy/paste generally sucks as I am addicted to JumpCut (clipboard buffering). Then I re-thought about it and found better ways to handle work on a VM and avoid being home sick !! Let me share with you some tips :

iTerm (ssh -X) + Parallels (openSUSE VM) + X11 (OpenOffice from the VM)

iTerm (ssh -X) + Parallels (openSUSE VM) + X11 (OpenOffice from the VM)

  1. Use remotely !
    That’s maybe the main tip, as all others tips are detailed application of these tip. If some of you use to server admin, just treat you virtual machine as just another server.
  2. Do not use a term in the VM use ssh instead .
    That’s a start. Why use a terminal in you VM instead of the terminal you are used to ?
  3. Use Xforwarding as much as possible, Macs come with X11.
    ssh -X can save you sometime. For example, once connected to my VM, ssh -X user@myvm, I can run soffice with display on the Mac side.
  4. Create shorcuts for ssh connection in ~/.ssh/config
    Here is a extract from my ~/.ssh/config :


    Host *
      ControlMaster auto
      ControlPath ~/.ssh/master-%r@%h:%p
    Host suse
      HostName 10.211.55.7
      User someUser

    You can use it whenever ssh is involed : ssh suse  or  scp myfile suse:~/someDir
    You can also mix that with public key identification instead of the ControlMaster.

  5. Be lazy, create an alias in ~/bash_profile
    ~/bash_profile is source at the start of a term session, if the file is not present, create it.
    alias suse=’ssh suse -X’;export suse
  6. You need something more persistant than a ssh session ? Use screen
    screen (man screen), is a terminal emulation allowing you to create session and then detach from it, reatach later on.
    basic usage is :
    screen -S ascreenname
    key binding : ctrl-a ctrl-d   # (detach)
    screen -r -d ascreename    # (retach)

    that’s really usefull when you want to create a persistant session or plan to suspend you virtual machine.
  7. You need to access regularly your file from the Mac in your VM ? (vis et versa) Use sshFs
    More information about Ssh FileSytem  : http://en.wikipedia.org/wiki/SSHFS
  8. Use your favorite text editor.
    Mine is BBedit, then I can do things from command line like
    bbedit sftp://suse/
    I am pretty sure you favorite text editor can do those trick as well. Else check TextWrangler.
  9. [that's some amount of awesomeness, maybe update your blog? ;) ]

  10. Always use your favorite text editor.
    I was talking with Thorsten about strange but awesome behaviors of MacOS applications lately, I just learn that MacOS apps using the Cocoa Framework is “able to glean the current display context when run from the shell”.

    By using this capability, and a custom script, we can use BBedit directly from the virtual machine.

    1. Let add some lines to the Mac ~/.ssh/config:

      Host suseroot
        HostName 10.211.55.7
        User root

      That’s way more friendly if bbedit can edit file on the VM as root.
    2. Let add some lines to the VM ~/.ssh/config:

      Host suseroot
        HostName 10.211.55.2
        User myMacUser

      You should also add a config file in /root/.ssh/config if you want to use it from the root account on the VM.
    3. Last step, let create a executable file /usr/bin/bbedit (chmod a+x /usr/bin/bbedit) with this content:

      #!/bin/bash
      d=`dirname "$1"`
      if [[ ! ${d:0:1} == "/" ]]; then d=$PWD/$d; fi;

      ssh mac 'bbedit -w sftp://suseroot/'$d/`basename "$1"`

    4. Now you can use bbedit as an editor on the VM directly, example : bbedit someFile.
      You can even use bbedit (/usr/bin/bbedit) as the standard EDITOR for your virtual machine, thanks to the -w option, example :

      export EDITOR=/usr/bin/bbedit
      crontab -e

    That’s all (now I really think that’s all), If you see something else, feel free to post some comments. ;-)

Rédigé par Jonathan

juin 15, 2009 à 3:22

Publié dans en

How to install the SDK and compile the C++ examples on MacOS Intel

laissez un commentaire »

This post is a MacOS oriented Hello-world based of this wiki page http://wiki.services.openoffice.org/wiki/SDKInstallation. You can have it in French also http://wiki.services.openoffice.org/wiki/FR/Documentation/Installation_du_SDK.

DocumentLoader

Good news, to start extending OpenOffice using UNO , like create components or extensions, you don’t necessary need to go trough a full build of the source code (I will talk later about the process) which take once start 6h30 on my MacBook Pro. Today, let’s be lazy.

  1. Requirements
  2. Get the SDK
    http://openoffice.cict.fr/stable/3.1.0/OOo-SDK_3.1.0_MacOSXIntel_install_en-US.dmg and you can drag and drop anywhere, that does not matter.
  3. Test the SDK
    Open a terminal (Terminal.app or Iterm.app), look for the OpenOffice.org3.1_SDK folder :

    cd <....>/OpenOffice.org3.1_SDK
    #-- And set the environment. Follow the instruction, nothing special here, just some settings.
    ./setsdkenv_unix
    #-- You will need to execute this command each time you want to work with the SDK.
    #-- Now go to examples :
    cd examples/cpp/DocumentLoader
    #-- and build
    make
    #-- A few set of instruction will be displayed at the end of the build, Let's follow them. First start OpenOffice
    soffice "-accept=socket,host=localhost,port=2083;urp;StarOffice.ServiceManager" &
    #-- and execute :
    make DocumentLoader.run

    If everything went fine, you should see a new document, with the content of OpenOffice.org3.1_SDK/examples/cpp/DocumentLoader/test.odt.

Just enjoy, you can now extend OpenOffice with the SDK. For more information,  continue your reading on this page : http://wiki.services.openoffice.org/wiki/SDKInstallation.

Rédigé par Jonathan

juin 3, 2009 à 1:10

Publié dans en, go-oo

How to install tesseract-ocr on Linux (or MacOS) ?

laissez un commentaire »

Tesseract result

Tesseract result

Sometimes it’s good to say when things are easy to do. Recently, for my project to integrate an OCR in OpenOffice, I tried Tesseract and this was really simple.

  1. The download

    You should take a look on this home page : http://code.google.com/p/tesseract-ocr/ and clic on this link : tesseract-2.03.tar.gz and un-archives it.

    wget http://tesseract-ocr.googlecode.com/files/tesseract-2.03.tar.gz
    tar -vxf tesseract-2.03.tar.gz
    cd tesseract-2.03

  2. The install

    You can read the INSTALL but this works fine:

    ./configure
    make
    sudo make install

  3. other downloads (but necessary)
     wget http://tesseract-ocr.googlecode.com/files/tesseract-2.00.eng.tar.gz
     tar -vxf tesseract-2.00.eng.tar.gz
     sudo mv tessdata/* /usr/local/share/tessdata/
  4. How to use it ?

    You’ll need a tiff file. A good way to get on is to convert a pdf.

    wget http://cne-w.ec-nantes.fr/public/jon/ooo/book-1.0-en.tif
    tesseract book-1.0-en.tif mypage
    less mypage.txt
    # And you will see some nice character recognition

  5. What’s next for me ?

    Tesseract provides a really good and fast character recognition (3s per page on my computer), but no layout analysis. The output is basically plain text, a line per text line on the input page, but It works on a lot of platform, then it can be a good alternative where ocropus don’t work.

Rédigé par Jonathan

mai 29, 2009 à 4:09

Publié dans en, go-oo

Rencontre avec la JE de science PO Rennes !

avec 5 commentaires

IEP rennes

IEP Rennes

J’ai eu l’occasion de faire connaissance en tant que représentant de Centrale Nantes Etudes avec la JE de science PO Rennes, nouvellement créée et que l’on parraine. Elle se positionne sur le marché du conseil aux entreprises (Marketing, Communication, Audit …) côte à côte avec les JE de Sup de CO de la région, et en exclusivité sur deux autres marchés, celui du conseil aux organisations à but non-lucratif et aux administrations, ce qui est extrêmement prometteur. C’est à voir ici http://www.sprjuniorconseil.com

Sinon notre rencontre a été très chaleureuse, j’ai pu participer à leur cocktail de fin d’année sur Rennes, et deux d’entre eux sont venus nous voir à Nantes pour l’ascension :

SPR JC & CNE

SPR JC & CNE

Voilà, enfin bon c’était hyper Sympâ !

Rédigé par Jonathan

mai 24, 2009 à 10:11

Publié dans CNE

OCR integration on Go-OO

avec 2 commentaires

go-oo-logo

About 3 weeks ago, my student proposal was accepted, I will be attending to the Google Summer of Code for Go-OO.org. My project is to integrate OCRopus capabilities in OpenOffice, ie.  we plan to be able to import images as odf.

I already started, I compiled ooo-build on my Mac, and on a openSUSE x86-64 (I still got some issues on a openSUSE x86-32  with Qt) and achieved to improve my network condition.Then the plan is to build successfully OCRopus on openSUSE, at least (already got some issues on the Mac), being able to debug in OO, and I will be ready to start in deep my project. I will also blog as often as possible about under the tag “go-oo” along the project progression.

See also :

Rédigé par Jonathan

mai 11, 2009 à 5:05

Publié dans en, go-oo

Tagged with

How to hide your desktop in MacOS X !

laissez un commentaire »

Hiding your desktop allow you get your entropy sensible place ever clean like that :

a clean desktop (geektools activated)

a clean desktop (geektools activated)

To do so, simply open the terminal and type :
defaults write com.apple.finder CreateDesktop -bool false

Then restart finder (right clic on finder, quit, …).

Rédigé par Jonathan

mai 11, 2009 à 11:22

Publié dans Mac Os, en

How to create a new admin-gen theme in symfony 1.2

laissez un commentaire »

As I got yesterday a deep issue in symfony 1.2, I decided to blog about it in english as I hope I will save a lot of time to somebody else. My issue was : “how to create a theme for my admin-gen ?”.

Most of admin-gen user know this piece of code in generator.yml

generator:
  class: sfPropelGenerator
    param:
      model_class:           BlogArticle
      theme:                 admin
      non_verbose_templates: true
      with_show:             false
      singular:              ~
      plural:                ~
      route_prefix:          article
      with_propel_route:     1

let’s talk in deep about it and especially about the theme param. Just an info, don’t try with_show, it’s not supported in symfony 1.2 ^^ !

First of all, back to the basis, what is init-admin, what is generate-admin ?

propel:init-admin

propel:init-admin

Result :

apps/frontend/modules/individu/actions:
actions.class.php
apps/frontend/modules/individu/config:
generator.yml
apps/frontend/modules/individu/templates:
propel:generate-admin

propel:generate-admin

Result :

apps/frontend/modules/entreprise/actions:
actions.class.php
apps/frontend/modules/entreprise/config:
generator.yml
apps/frontend/modules/entreprise/lib:
entrepriseGeneratorConfiguration.class.php      entrepriseGeneratorHelper.class.php
apps/frontend/modules/entreprise/templates:

To be honest, the only difference I see between init-admin and generate-admin is generate-admin is more complete :

  • It creates a routing rules.
  • It creates some helpers.

Anyways, they both works the same way as they will inherit from cache/frontend/dev/modules/autoIndividu and cache/frontend/dev/modules/autoEntreprise for the modules apps/frontend/modules/individu and apps/frontend/modules/entreprise.

OK, let’s go. My first advice :

symfony cc is not enough, it only cleans configuration and basic cache, but not the generated code. Then rm -r cache/* is the best way to handle a clean of the cache.

My second advice is :

changing randomly the theme param value will not get far cause no error is thrown if the theme is missing, as reported in the ticket #6287! Then the correct default value are :

  • default is you are using a propel:init-admin.
  • admin is you are using a propel:generate-admin

Now let’s create a new theme from a existing one.

The documentation says :

Customizing the Theme

There are other partials inherited from the framework that can be overridden in the module templates/ folder to match your custom requirements.

The generator templates are cut into small parts that can be overridden independently, and the actions can also be changed one by one.

Which means you can override parts of the template contained in cache/frontend/dev/modules/autoIndividu/template in apps/frontend/modules/individu/template.

However, if you want to override those for several modules in the same way, you should probably create a reusable theme. A theme is a sub-set of templates and actions that can be used by an administration module if specified in the theme value at the beginning of generator.yml. With the default theme, symfony uses the files defined in $sf_symfony_lib_dir/plugins/sfPropelPlugin/data/generator/sfPropelModule/admin/.

That’s true, but only for a module generated with propel:generate-admin (which btw gets as default theme the value admin). Even if it’s works be aware that have to do a full copy of the theme (no override of the default theme).

Here we do :


# move to root of your project
cd root_of_my_symfony_project
# get the generator's data for symfony 1.2.6
svn export http://svn.symfony-project.com/tags/RELEASE_1_2_6/lib/plugins/sfPropelPlugin/data/generator data/generator
#you can delete what you don't need
rm -r data/generator/sfPropelForm*

Now you get pretty much everything you need for a theme available in a propel:init-admin generated module, or a propel:generate-admin generated module:

  • If you want to make a theme for a init-admin :
    cp -R generator/sfPropelAdmin/default generator/sfPropelAdmin/mytheme
  • If you want to make a theme for a generate-admin :
    cp -R generator/sfPropelModule/admin generator/sfPropelModule/mytheme

To activate your theme :

generator:
  class: sfPropelGenerator
    param:
      model_class:           BlogArticle
      theme:                 mytheme
      non_verbose_templates: true
      with_show:             false
      singular:              ~
      plural:                ~
      route_prefix:          article
      with_propel_route:     1

Enjoy the advanced admin-gen customization and feel free to ping me in comments if you need more info or if you see something, you are much welcome ;-)

Rédigé par Jonathan

mai 2, 2009 à 4:10

Publié dans en, symfony

Tagged with ,

Semaine de taré : Fin

avec 3 commentaires

Pas de blog post depuis 10 jours : C’est à dire que depuis 10 jours c’est encore plus le stress que d’habitude. Déjà nos chers EI3s partent en stage, c’est un peu kernel panic. Puis un projet à finir, puis mon stage à trouver, mais en gros ce qui m’a plus tué, c’est le jet-lag de ce week-end. Enfin bon, je ne suis pas tout seul dans ce cas, rien que @Chebitchov, @divarel et @nvaroqua ont tous passé une semaine chargée.

divarvel à la JE de nuit

Divarvel qui va bienôt faire dodo à la JE

Je suis tellement crevé que je ne vais pas finir cet article. [J'aurai mieux dormis à la JE]

Rédigé par Jonathan

avril 3, 2009 à 5:22

Publié dans fr