Homepage Forums RetroPie Project Peoples Projects Updated Python Scraper for EmulationStation

Viewing 29 posts - 1 through 29 (of 29 total)
  • Author
    Posts
  • #85645
    thadmiller
    Participant

    Hi All,

    I added a number of updates (and some removals) to the elpendor ES-scraper, also adding in chugcup’s title matching algorithm. It’s been working well for me (and for one friend who’s also been testing), so I thought I would share.

    Instructions are on https://github.com/thadmiller/ES-scraper. The simple version is:

    – before running, make sure you have updated the RetroPie Setup script and binaries (the initial 2.x version had invalid XML in es_systems.cfg).

    $ sudo apt-get install python-imaging
    $ git clone https://github.com/thadmiller/ES-scraper.git
    $ cd ES-scraper
    $ python scraper.py -pisize -p

    (remove the -p if you want to scrape all platforms, add a -l if you want to run it in the fully-automated “I’m feeling lucky mode”).

    Thad

    #85658
    brakanje
    Participant
    login as: pi
    pi@192.168.1.129's password:
    Linux raspberrypi 3.18.3+ #740 PREEMPT Wed Jan 21 23:55:56 GMT 2015 armv6l
    
    The programs included with the Debian GNU/Linux system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.
    
    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.
    Last login: Thu Jan 22 16:47:59 2015
    
       .~~.   .~~.    Thursday, 22 January 2015,  5:33:34 pm UTC
      '. \ ' ' / .'   Linux 3.18.3+ armv6l GNU/Linux
       .~ .~~~..~.
      : .~.'~'.~. :   Filesystem      Size  Used Avail Use% Mounted on
     ~ (   ) (   ) ~  rootfs           29G  7.1G   21G  26% /
    ( : '~'.~.'~' : ) Uptime.............: 0 days, 00h45m51s
     ~ .~       ~. ~  Memory.............: 64268kB (Free) / 250872kB (Total)
      (   |   |   )   Running Processes..: 76
      '~         ~'   IP Address.........: 192.168.1.129
        *--~-~--*     The RetroPie Project, www.petrockblock.com
    
    pi@raspberrypi /ES-scraper $ python scraper.py -pisize -l
    Traceback (most recent call last):
      File "scraper.py", line 586, in <module>
        ES_systems = readConfig(config)
      File "scraper.py", line 84, in readConfig
        config = ET.parse(file)
      File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
        tree.parse(source, parser)
      File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
        parser.feed(data)
      File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
        self._raiseerror(v)
      File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
        raise err
    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 101, column 237
    pi@raspberrypi /ES-scraper $
    

    I am guessing I did something wrong but am not exactly sure what.

    #85674
    thadmiller
    Participant

    The error you encountered is due to invalid XML in the es_systems.cfg

    Have you updated the RetroPie Setup script and Binaries?

    $ sudo RetroPie-Setup/retropie_setup.sh
    
    UPDATE RetroPie Setup script
    --restart script--
    UPDATE RetroPie Binaries
    

    I believe that will fix your issue (and, in my experience, it also fixes many other issues with some of the emulators). Alternatively, you could try to fix the XML manually – just go to the location the script states: line 101, column 237 (but be warned, there will probably be a number of issues with the initial config).

    Thad

    #85732
    brakanje
    Participant

    I am using a heavily self modified XML that is a combo of the PC xml and the xml that came with the image. As far as me and NP++ can see the XML is valid though. I’ll have a look again to see what may be invalid about it.

    #85734
    brakanje
    Participant
    pi@raspberrypi /ES-scraper $ python scraper.py -pisize -l
    Traceback (most recent call last):
      File "scraper.py", line 586, in <module>
        ES_systems = readConfig(config)
      File "scraper.py", line 90, in readConfig
        platform = child.find('platform').text
    AttributeError: 'NoneType' object has no attribute 'text'

    Got through all of the errors where your code didn’t like double hyphens and double anpersants and even removed any place where my theme was empty and I’m still getting this error although now it’s not giving me any specifics.

    I would suggest that perhaps you rig your parser to ignore comments as that was half the trouble.

    #85752
    thadmiller
    Participant

    Your last error is due to a <system> (in your es_systems.cfg) missing a <platform> element – this is needed to determine what platform to scrape from.

    I have updated the script to ignore any systems missing information needed for scraping (this won’t fix the fact that information is missing, but should be able to skip over it, allowing any other correct systems to be scraped), you can easily get the updated script by:

    $ cd ES-scraper
    $ git pull

    As for the double-hyphens within comments, that is actually invalid XML, and it’s python’s xtree.parse choking on the invalid characters (the XML parser used by the original author). I may look at trying to use a different (more forgiving on errors) XML parser or parsing it manually, but either of those options would require rewriting a decent chunk of the script.

    For an alternative, quicker solution, I’m going to add arguments to allow a path, platform, and extensions to be specified – this will be slightly more manual, but will allow the script to run, no matter how old or broken the es_settings.cfg file is.

    Thad

    #85757
    brakanje
    Participant

    Hrm. When I was at uni we were told that no matter what was in a comment it was valid code as a comment should never be run by any parser/interpreter/compiler. That aside even if those double hyphens are not in a comment your script seems to choke on it probably because it’s still not valid XML it is however needed when passing some Linux directives. Unless of course the ES team wrote their parser to replace something else with the double hyphen needed. But that’s a slightly different topic. :P

    Just to be clear this whole time I have not been trying to be needy or demanding. I apologize if I at all came off poorly.

    #85760
    thadmiller
    Participant

    I agree completely, any text within a comment should be fine, but unfortunately that’s not the case http://www.w3.org/TR/REC-xml/#sec-comments. If you ask me, it’s a “broken” definition. I suspect others feel the same way, and that’s probably why some other XML parsers ignore the — within comments.

    And you’re not coming off needy or demanding – I wrote the script (or modified, anyway) because I was unhappy with the speed and poor matching of every other scraper I could find, and I want it to be helpful for others too.

    #85767
    brakanje
    Participant

    Ok I had it running. My putty window crashed I reopened it and tried running your script. “All Done” massive amounts of missing scrapses though. So I delete my gamelist directory and run it again. still “All Done” so I regit it and same thing. I dunno how I could have broken it when I finally had it working. >.<

    #85778
    thadmiller
    Participant

    I’ve updated the script with optional arguments to ignore es_systems.cfg if the -name and -platform are manually specified. Ex:
    $ python scraper.py -pisize -l -name mame -platform arcade

    It’s a slightly more manual operation, but if these arguments are included, es_systems.cfg (and any of its issues) will be completely ignored.


    @brakanje
    , I don’t think you’re doing anything wrong. I’m only TheGamesDB.net as the source, so there is a likelihood some of your games are missing (about 10 of my 170 games didn’t exist in their library), but if you have massive amounts of missing scrapes, I suspect something else may be the issue. If you wouldn’t mind running again with the -v option, and capture the output, I’d like to take a look at it to figure out what’s going on.

    thanks

    #85785
    brakanje
    Participant
    pi@raspberrypi /ES-scraper $ python scraper.py -pisize -l -v
    ES-scraper, a scraper for EmulationStation
    Using Raspberry Pi boxart size: (375px x 350px)
    Verbose mode enabled.
    All done!
    pi@raspberrypi /ES-scraper $
    

    Not sure how much that will help. That is some kind of verbose log indeed. :P

    #85787
    thadmiller
    Participant

    hah, true – looks like it didn’t find any ROMs at all.

    In that case, maybe you want to try the manual method (ignoring es_systems.cfg entirely)
    $ python scraper.py -pisize -v -l -name mame -platform arcade -rompath ~/RetroPie/roms/mame -ext ".zip .ZIP"
    (those would be the default values on my retropie installation searching for mame, obviously change your name, platform, rompath, and ext as necessary)

    In the meantime, I’ll add some more logging so we can see what platforms and paths it found (the verbose section, right now, just affects the scraping, but it looks like you’re having issues before it starts scraping at all).

    #85789
    thadmiller
    Participant

    Also, I just added those new arguments a couple hours ago – if you haven’t already, you’ll want to do another git pull to get those updates:

    
    $ cd ES-scraper
    $ git pull
    
    #85793
    brakanje
    Participant

    K so manual mode is working. Not sure why auto mode wouldn’t be. Must be some error with my systems though if all we did was bypassed systems. :P

    #85798
    thadmiller
    Participant

    I added some logging to the verbose mode while the script parses the es_systems.cfg file. For each <system> it should list name, path, platform, and ext (all the required data) along with the number of files found within the path (not exactly the ROM count, since it looks for any file rather than matching the extension, but it should be good enough for debugging).

    Just update the script with a
    $ git pull

    and run with the -v flag
    $ python scraper.py -pisize -v -l

    #85805
    brakanje
    Participant

    I’m gonna say I think it would be handy if there was a way to interupt the script. :P I’ve been scrapign for like an hour on NES and I’m going to be getting picked up in like half an hour. :P

    #85807
    thadmiller
    Participant

    you should be able to hit CTRL-C and it will exit cleanly (saving what has already been completed)

    #85829
    brakanje
    Participant

    When I hit ctrl+c it interupts just that rom and moves to the next one saddly.

    #85830
    brakanje
    Participant
    pi@raspberrypi /ES-scraper $ python scraper.py -pisize -v -l
    SYSTEM:
      Name: amiga
      Path: /home/pi/RetroPie/roms/amiga
      Platform: amiga
      Ext: .adf .ADF
      Potential ROMs: 0
    SYSTEM:
      Name: atari800
      Path: /home/pi/RetroPie/roms/atari800
      Platform: atari800
      Ext: .xex .XEX
      Potential ROMs: 0
    SYSTEM:
      Name: atari2600
      Path: /home/pi/RetroPie/roms/atari2600
      Platform: atari2600
      Ext: .a26 .A26 .bin .BIN .rom .ROM .zip .ZIP .gz .GZ
      Potential ROMs: 0
    SYSTEM:
      Name: atari5200
      Path: /home/pi/RetroPie/roms/atari5200
      Platform: atari5200
      Ext: .a26 .A26 .bin .BIN .rom .ROM .zip .ZIP .gz .GZ
      Potential ROMs: 0
    SYSTEM:
      Name: atariststefalcon
      Path: /home/pi/RetroPie/roms/atariststefalcon
      Platform: atarist
      Ext: .st .ST .img .IMG .rom .ROM .ipf .IPF
      Potential ROMs: 0
    SYSTEM:
      Name: macintosh
      Path: /home/pi/RetroPie/roms/macintosh
      Platform: mac
      Ext: .txt
      Potential ROMs: 0
    SYSTEM:
      Name: c64
      Path: /home/pi/RetroPie/roms/c64
      Platform: c64
      Ext: .crt .CRT .d64 .D64 .g64 .G64 .t64 .T64 .tap .TAP .x64 .X64 .zip .ZIP
      Potential ROMs: 0
    SYSTEM:
      Name: amstradcpc
      Path: /home/pi/RetroPie/roms/amstradcpc
      Platform: cpc
      Ext: .cpc .CPC .dsk .DSK
      Potential ROMs: 0
    SYSTEM:
      Name: fba
      Path: /home/pi/RetroPie/roms/fba
      Platform: arcade
      Ext: .zip .ZIP .fba .FBA
      Potential ROMs: 0
    SYSTEM:
      Name: gb
      Path: /home/pi/RetroPie/roms/gb
      Platform: gb
      Ext: .gb .GB
      Potential ROMs: 0
    SYSTEM:
      Name: gba
      Path: /home/pi/RetroPie/roms/gba
      Platform: gba
      Ext: .gba .GBA
      Potential ROMs: 0
    SYSTEM:
      Name: sgb2
      Path: /home/pi/RetroPie/roms/gbc
      Platform: gbc
      Ext: .gbc .GBC
      Potential ROMs: 0
    SYSTEM:
      Name: gamegear
      Path: /home/pi/RetroPie/roms/gamegear
      Platform: gamegear
      Ext: .gg .GG
      Potential ROMs: 0
    SYSTEM:
      Name: intellivision
      Path: /home/pi/RetroPie/roms/intellivision
      Platform: intellivision
      Ext: .int .INT .bin .BIN
      Potential ROMs: 0
    SYSTEM:
      Name: mame
      Path: /home/pi/RetroPie/roms/mame
      Platform: arcade
      Ext: .zip .ZIP
      Potential ROMs: 0
    SYSTEM:
      Name: neogeo
      Path: /home/pi/RetroPie/roms/neogeo
      Platform: neogeo
      Ext: .zip .ZIP .fba .FBA
      Potential ROMs: 0
    SYSTEM:
      Name: nes
      Path: /home/pi/RetroPie/roms/nes
      Platform: nes
      Ext: .nes .unf .NES .UNF
      Potential ROMs: 0
    SYSTEM:
      Name: n64
      Path: /home/pi/RetroPie/roms/n64
      Platform: n64
      Ext: .z64 .Z64 .n64 .N64 .v64 .V64
      Potential ROMs: 0
    SYSTEM:
      Name: pcengine
      Path: /home/pi/RetroPie/roms/pcengine
      Platform: pcengine
      Ext: .pce .PCE
      Potential ROMs: 0
    SYSTEM:
      Name: scummvm
      Path: /home/pi/RetroPie/roms/scummvm
      Platform: pc
      Ext: .exe .EXE
      Potential ROMs: 0
    SYSTEM:
      Name: mastersystem
      Path: /home/pi/RetroPie/roms/mastersystem
      Platform: mastersystem
      Ext: .sms .SMS
      Potential ROMs: 0
    SYSTEM:
      Name: megadrive
      Path: /home/pi/RetroPie/roms/megadrive
      Platform: genesis,megadrive
      Ext: .smd .SMD .bin .BIN .gen .GEN .md .MD .zip .ZIP
      Potential ROMs: 0
    SYSTEM:
      Name: segacd
      Path: /home/pi/RetroPie/roms/segacd
      Platform: segacd
      Ext: .smd .SMD .bin .BIN .md .MD .zip .ZIP .iso .ISO
      Potential ROMs: 0
    SYSTEM:
      Name: sega32x
      Path: /home/pi/RetroPie/roms/sega32x
      Platform: sega32x
      Ext: .32x .32X .smd .SMD .bin .BIN .md .MD .zip .ZIP
      Potential ROMs: 0
    SYSTEM:
      Name: psx
      Path: /home/pi/RetroPie/roms/psx
      Platform: psx
      Ext: .img .IMG .7z .7Z .pbp .PBP .bin .BIN .cue .CUE
      Potential ROMs: 0
    SYSTEM:
      Name: snes
      Path: /home/pi/RetroPie/roms/snes
      Platform: snes
      Ext: .smc .sfc .fig .swc .SMC .SFC .FIG .SWC
      Potential ROMs: 0
    SYSTEM:
      Name: zxspectrum
      Path: /home/pi/RetroPie/roms/zxspectrum
      Platform: zxspectrum
      Ext: .z80 .Z80 .ipf .IPF
      Potential ROMs: 0
    SYSTEM:
      Name: vboy
      Path: /home/pi/RetroPie/roms/vboy
      Platform: nintendo-virtual-boy
      Ext: .vb .VB
      Potential ROMs: 0
    SYSTEM:
      Name: esconfig
      Path: /home/pi/RetroPie/roms/esconfig
      Platform: ignore
      Ext: .py .PY
      Potential ROMs: 0
    ES-scraper, a scraper for EmulationStation
    Using Raspberry Pi boxart size: (375px x 350px)
    Verbose mode enabled.
    All done!
    pi@raspberrypi /ES-scraper $
    

    More details but same result and no clear fix as everything looks right.

    #85862
    brakanje
    Participant

    Hey I just discovered your script seems to save images as JPEG or PNG but then reports them as JPG to the gamelist so all the images come up blank.

    #85885
    thadmiller
    Participant

    I’ll update CTRL-C to cancel out of all scraping.

    So, looking at your verbose output, it looks like all your paths are correct(?), but zero files were found in each directory. That explains why nothing is scraped, but I don’t know why it would be finding zero files unless the paths are wrong.

    It’s odd that the gamelist contains a different extension than the actual file – it’s the same string that saves the file (technically, a rename, but whatever) that is written to the XML file. I’m not able to reproduce this, could you let me know a platform and ROM that didn’t work for you?

    #85889
    brakanje
    Participant

    It happened when I ran it on GoodNES. I used NP++ and imagemagic to rectify it which is no thing but figured I’d give you the heads up.

    #85960
    brakanje
    Participant

    It just occured to me. could it see no files in rom directory and decide not to bother scanning the subdirectories?

    #85976
    thadmiller
    Participant

    Ah, yes, subfolders would cause an (easily fixable) issue. I didn’t even know ES would process them, but now that I know, it’s been fixed.

    Do another
    $ git pull
    and the scraper should be okay with subfolders.

    I’d still like to rectify the image discrepancy you ran into, but I’m not able to reproduce, and can’t think of any reason why it would do that. However, I wonder if paths, now being correct for your subfolders, might straighten things out a bit (but you’ll either want to add the -f parameter, or remove your old gamelist.xml files so you don’t have old stuff hanging around).

    #86005
    brakanje
    Participant

    When I’m done reimagining I’ll let you know if it happens again.

    #86062
    brakanje
    Participant

    Did it again and this time no issue. Very confusing. Is there some reason that it doesn’t go through folders in alphabetical order?

    #86068
    thadmiller
    Participant

    I’m going to guess the “confusing” part is the image extension discrepancy – if so, this may be due to other scrapers – I put in the effort to make sure this scraper plays nicely with the built-in ES scraper, but if another scraper was run, it could have created conflicting entries. The fact that running it on a fresh install produces good images, seems to confirm that possibility. But if you do find that my scraper is causing causing an issue, I’d like to know the steps to reproduce, so I can fix it.

    As for the order the scraper processes – the order of systems in the es_systems.cfg defines the order the platforms are processed. The order the ROMs are processed within each platform folder is arbitrary (I’m not sorting it) – so the order is actually defined by the OS.

    #86088
    brakanje
    Participant

    Ahh. so that is probably how the sub-folders are being run as well then. :P

    #100736
    poochie
    Participant

    sry for bumping this ‘old’ thread
    running this script with python scraper.py -pisize -l gives me the following errors.

    Traceback (most recent call last):
      File "scraper.py", line 665, in <module>
        scanFiles(ES_systems[i])
      File "scraper.py", line 448, in scanFiles
        platforms = getPlatformNames(SystemInfo[3])
      File "scraper.py", line 435, in getPlatformNames
        for (i, platform) in enumerate(_platforms.split(',')):
    AttributeError: 'NoneType' object has no attribute 'split'

    how can i fix this?

Viewing 29 posts - 1 through 29 (of 29 total)
  • The forum ‘Peoples Projects’ is closed to new topics and replies.