• 1 Post
  • 14 Comments
Joined 5 months ago
cake
Cake day: January 25th, 2024

help-circle





  • For the OCR, have you tried tesseract? For printed documents it can take image input and generate a pdf with selectable text. I don’t OCR much but it has been useful when I tried a few times.

    You might be able to have a script that takes the scanner input into tesseract and output a pdf. It only works on a single image per run so I had to make script to run it on whole pdf by separating it and stitching it back together.


  • Someone already talked about the XY problem, so I’ll say this.

    Why sound notification instead of notification content? If your notification program (dunst in my case) have pattern matching or calling scripts based on patterns and the script has access to which app, notification title, contents etc. then it’s just about calling something in your bash script.

    And any time you wanna add that functionality to something else, add one more line with a different pattern or add a condition in your script. Comparing text is lot more reliable than audio.

    Of course your use case could be completely different, so maybe give some examples of use case so people can give you different ways to solve that instead of just the one you’re thinking of.






  • Hi there, I did say it’s easily doable, but I didn’t have a script because I run things based on the image before OCR manually (like the negating the dark mode I tried in this script; when doing manually it’s just one command as I know whether it’s dark mode of not myself; similar for the threshold as well).

    But here’s a one I made for you:

    #!/usr/bin/env bash
    
    # imagemagic has a cute little command for importing screen into a file
    import -colorspace gray /tmp/screenshot.png
    mogrify /tmp/screenshot.png -color-threshold "100-200"
    # extra magic to invert if the average pixel is dark
    details=`convert /tmp/screenshot.png -resize 1x1 txt:-`
    total=`echo $details | awk -F, '{print $4}'`
    value=`echo $details | awk '{print $7}'`
    darkness=$(( ${value#_(%_)} * 100 / $total ))
    if (( $darkness < 50 )); then
       mogrify -negate /tmp/screenshot.png
    fi
    
    # now run the OCR
    text=`tesseract /tmp/screenshot.png -`
    echo $text | xclip -selection c
    notify-send OCR-Screen "$text"
    

    So the middle part is to accommodate images in dark mode. It negates it based on the threshold that you can change. Without that, you can just have import for screen capture, tesseract for running OCR. and optionally pipe it to xclip for clipboard or notify-send for notification.

    In my use case, I have keybind to take a screenshot like this: import png:- | xclip -selection c -t image/png which gives me the cursor to select part of the screen and copies that to clipboard. I can save that as an image (through another bash script), or paste it directly to messenger applications. And when I need to do OCR, I just run tesseract in the terminal and copy the text from there.


  • Not for handwritten text, but for printed fonts, getting OCR is as easy as just making a box in screen with current technology. So I don’t think we need AI things for that.

    Personally I use tesseract. I have a simple bash script that when run let’s me select a rectangle in screen, save that image and run OCR in a temp folder and copy that text to clipboard. Done.

    Edit: for extra flavor you can also use notify-send to send that text over a notification so you know what the OCR produced without having to paste it.