Homepage Forums RetroPie Project Everything else related to the RetroPie Project New version of sselph/scraper v0.9.0-beta

Viewing 35 posts - 1 through 35 (of 59 total)
  • Author
    Posts
  • #101350
    sselph
    Participant

    Hi Everyone,

    I’ve been working on my scraper to refactor much of the code to make it easier to add features and I added a few features since I’ve posted last.
    https://github.com/sselph/scraper

    New Features:

    • MAME/Arcade descriptions – I added in information from arcade-history so that MAME and other arcade systems should have more complete data.
    • PSX Support – I added support for bin/cue PSX games from redump dat files. It will create a single entry for each cue file.
    • Dreamcast Support – I added support for gdi/bin games from redump dat files. It seems reicast supports this format but it isn’t enabled in es_systems.cfg.
    • Zip/Gzip support – since retroarch added zip/gzip support I now scan inside zip files for the first file that looks like a rom and scan it.
    • More accurate and complete scraping on several systems. Thanks to @robertybob for adding literally ~1000 games to thegamesdb.
    • Ability to append to a gamelist – You can now use -append to skip files that are already in the gamelist.zml file.

    Guide:
    Thanks to Floob there is a very nice video guide that is still valid:

    Issues:
    Since I’ve changed most of the code and don’t have a lot of tests, I’m sure I have created bugs. Please create issues here:
    https://github.com/sselph/scraper/issues

    #101360
    Floob
    Member

    Thanks very much for the update. Its great!
    Loving the extra description detail on mame roms.

    I’ve added an error I found on your issue list, it may just be me doing something odd though.

    I like the PSX support, although as I have very few PSX games, and I dont use a .cue for single track games I’ll probably still use the ad-hoc in built scraper for those.

    Thanks again for all the work you put into this, it makes Emulation Station so much nicer to use.

    I’ll try to sort a new video for all these updates!

    #101368
    sselph
    Participant

    Thanks for the report. I’ll release a fix soon if I don’t hear any other issues.

    Regarding bin/cue: The scraper will still scrape the bin file if there isn’t a cue file. How it works is it looks for cue files, parses them then gets a list of associated bin files. Then hashes files cue/track1/track2/etc until it finds a match and uses that. So if there isn’t a cue it will just treat the .bin as a binary and hash that like normal.

    #101369
    Floob
    Member

    Ah I see – thats great. I’ll give it a go.

    Can you remind me how the mame lookup works – which database does it check?

    For example, I’ve got ddp2.zip which is:
    http://www.progettoemma.net/index.php?gioco=ddp2&lang=en

    but nothing scraped?

    #101370
    Floob
    Member

    Scrap that – it found it this time – just no image returned.

    One it didnt find was wyvernf0.zip
    http://www.progettoemma.net/index.php?gioco=wyvernf0&lang=en

    #101371
    sselph
    Participant

    It uses mamedb.com. It strips off the file extension and pulls the url http://www.mamedb.com/game/wyvernf0

    mamedb.com uses .147 and wyvernf0 is .154

    #101372
    Floob
    Member

    Also, when processing mame4all roms I seem to periodically get these errors

    I dont think its rom specific though, as its a consecutive batch, then next scrape they are fine and others complain?

    /07/05 01:47:12 INFO: Starting: bosco.zip
    2015/07/05 01:47:12 ERR: error processing bosco.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: bouldash.zip
    2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: bouldash.zip
    2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: bouldash.zip
    2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: brain.zip
    2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: brain.zip
    2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: brain.zip
    2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: breakers.zip
    2015/07/05 01:47:13 ERR: error processing breakers.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: breakers.zip
    2015/07/05 01:47:13 ERR: error processing breakers.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: breakers.zip
    2015/07/05 01:47:14 ERR: error processing breakers.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brkthru.zip
    2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brkthru.zip
    2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brkthru.zip
    2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brubber.zip
    2015/07/05 01:47:15 ERR: error processing brubber.zip: ILM Bad HTML
    2015/07/05 01:47:15 INFO: Starting: brubber.zip
    
    #101373
    Floob
    Member

    [quote=101371]It uses mamedb.com. It strips off the file extension and pulls the url http://www.mamedb.com/game/wyvernf0

    [/quote]

    Ah – ok, that explains it. Thanks.

    #101375
    sselph
    Participant

    Hmm those errors are from the mame scraper trying to parse the result of getting the URL and getting a response it can’t parse. Since it happens with different roms and in bursts might be some throttling or issues with the website.

    #101376
    Floob
    Member

    Could a backupdb query work like this?

    http://www.progettoemma.net/gioco.php?game=wyvernf0

    with the image being:
    http://www.progettoemma.net/snap/wyvernf0/0000.png

    Just a thought. I’m more than impressed with what it does already!

    #101377
    sselph
    Participant

    Yeah we can create a backup DB. The metadata I could probably download another dat file parse it and shove it in the same data store I’m using for history then point to images in another site or see how taxing it would be to host them.

    #101378
    Floob
    Member

    [quote=101375]Hmm those errors are from the mame scraper trying to parse the result of getting the URL and getting a response it can’t parse. Since it happens with different roms and in bursts might be some throttling or issues with the website.

    [/quote]

    Just tried it again, and its fine now. Must have been a temporary bottleneck like you said.

    #101396
    Floob
    Member

    Just had a major meltdown with some atarilynx rom scraping which seemed fine before. Can you see where the issue may be?

    github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462a4c sp=0x1a4629e0
    github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462ab8 sp=0x1a462a4c
    github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462b24 sp=0x1a462ab8
    ...additional frames elided...
    created by main.CrawlROMs
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:173 +0x5e4
    
    goroutine 1 [chan send]:
    main.CrawlROMs(0x11522cc0, 0x10a48010, 0x1, 0x1, 0x10810140, 0x1080aa88, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:184 +0xf98
    main.Scrape(0x10a48010, 0x1, 0x1, 0x10810140, 0x1080aa88, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:285 +0x194
    main.main()
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:414 +0xf54
    
    goroutine 5 [syscall]:
    os/signal.loop()
            /usr/local/go/src/os/signal/signal_unix.go:21 +0x1c
    created by os/signal.init·1
            /usr/local/go/src/os/signal/signal_unix.go:27 +0x40
    
    goroutine 15 [chan receive]:
    main.func·003()
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:187 +0x60
    created by main.CrawlROMs
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:184 +0x938
    
    goroutine 14 [chan receive]:
    main.func·002()
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:177 +0x94
    created by main.CrawlROMs
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:180 +0x6b8
    
    goroutine 10 [select]:
    net.func·019()
            /usr/local/go/src/net/dnsclient_unix.go:241 +0x310
    #101406
    sselph
    Participant

    Thanks!

    I think I see the error and have submitted a fix and releasing a new version. Hopefully I get all the issues before I hit 1.0.0 :)

    #101522
    robertybob
    Participant

    Keep up the great work Sselph! If ever you want to add more systems and want someone to help you match up IDs or whatever, just ask me :)

    #101618
    ekstreme
    Participant

    Working well for me. Just scraped my GnGeo set.

    #102257
    socalretrogamer
    Participant

    Thanks for this scraper! It works great! Much, much better than the scraper on Emulation Station. There are still a lot of games it didn’t scrape, but I think that’s because some of the ROM file names are truncated. For example, “Zelda2” (no space) didn’t scrape for NES. At some point, I plan on renaming all the files that didn’t scrape, so would you have any tips to ensure the scraper recognizes the title? Particularly a sequel game like Zelda 2? Thanks again!

    #102273
    sselph
    Participant

    Hi socialretrogamer,

    To minimize false positives, On consoles I’m not actually using the name of the file only the extension so you could name them 1.nes, 2.nes and it should still work. The scraper is using the rom data itself. It hashes it and compare it to a hash to ID mapping database I generated by hand for each system it supports.

    So there are several reasons it may not have scraped a rom:

    • Different ROM dump and therefore different hash. The Zelda2 could be bad, hacked, overdumped, or a rev no in the no-intro hashes I used.
    • No entry in thegamesdb, for SNES there are 3385 No-Intro roms and only 1055 games in the GDB. With clones I matched 2434.
    • No entry in my DB, because I have to manually add the hash>ID, I don’t automatically have new entries.
    #102294
    gutossn
    Participant

    The scraper is amazing! Very fast and doesn’t freezes the ES. So could you include the wonderswan and neogeo pocket (and color too) to database? Thank you.

    #102297
    sselph
    Participant

    I have issues tracking adding new systems on github. It is a function of: are there available hashes, what are the file formats, are there entries in thegamesdb.net, how many games, how busy I am, etc.

    Feel free to add issues for each system but I can’t make any promises until I look more closely.

    #103154
    Anonymous
    Inactive

    Hi sselph!!
    First of all, too many thanks for this awesome scraper!!!

    I’ve one question that I can’t find a solution: (may be, I’m to newbee ;)

    I start one scraper session and, if for any reason (like I abort execution crtl+C, or scraper show errors and exit), the scraper don’t finish a complete rom directory, ¿How can I continue the scraper session without analyze all roms I’ve now correctly scraped?

    Thanks again for your hard work with this great super-tool!! :)

    *EDIT*
    Ok, I think I need to use -append=true param…

    #103156
    sselph
    Participant

    Hi,

    Yes the -append flag should be what you are looking for, although the scraper will skip downloading any images that already exist so should be fast to catch back up either way.

    I have too many flags :)

    #104126
    Omnija
    Participant

    Will there be support for psx .pbp formats?

    #104178
    sselph
    Participant

    I don’t know enough about the pbp file format to know if I could translate the information it contains to what would have been in the original bin file to match it against the hash in redump.

    #104268
    Anonymous
    Inactive

    Great work on version 1.0.0 sselph!!

    I have a question, i have a complete collection of PAL Megadrive boxart……why you may ask, well i feel that the PAL look of the boxart is much more appealing to me (being from the UK) and actually has MegaDrive on the boxart. Is there a way we can implement scrapping just PAL box art for the Megadrive at all. I can upload these images to a place of your discretion if you like, if this would bring this idea into reality??

    #104281
    sselph
    Participant

    There are a couple issues with the whole megadrive/genesis situation. First one is when I did the mapping from hash to gamedb id I didn’t really care which version I chose as long as there was a match. So if there were a US version and a EU version I just chose one at random, sometimes I looked to see which one had the best description or clearer image. The other issue is data quality from thegamedb, there are several megadrive games that have genesis art and possibly vice versa.

    When I have time to remap MD and GEN I’ll take better care at only giving a MD version a GEN match if there isn’t a MD entry in the DB and vice versa. Ideally we could get the entries in thegamesdb fixed and improved so that other projects benefit as well.

    I have tinkered with the idea of setting up a repository of my own to improve some of the MAME stuff but haven’t had time. If I do, I’ll see if I could do something similar for other systems but I imagine the cost would be prohibitive and I won’t actually do any of it :)

    #104523
    greyhulk
    Participant

    hi guys, im using the inbuilt scraper on psx games its finds the relevants artwork etc but when i restart my pi its all missing again? any advice..

    thanks
    steve

    #104533
    herbfargus
    Member

    It may not be writing manual changes unless you cleanly exit emulationstation. So select quite emulationstation from the start menu and when it reloads see if your changes save.

    #104907
    Anonymous
    Inactive

    Is there a build for windows at all?

    #104946
    sselph
    Participant

    I make several prebuilt binaries available at https://github.com/sselph/scraper/releases

    or if your the type that likes compiling it yourself, there are no special instructions for doing it on windows.

    #105092
    Anonymous
    Inactive

    Nice!, thanks

    #109770
    phantom27
    Participant

    Ok… So I might be dumb…. No… I’m pretty sure I am… but I need help.

    I have a ROM database that I tried running this on. I did it on my mac. It looked like it worked. Even said saving session… etc. But I can’t find the gamelist.xml file. I even searched my mac for it.

    I’m probably doing something wrong.

    #109771
    phantom27
    Participant

    Yep, I’m an idiot apparently. I didn’t realize it would put it in my ‘home’ folder. Found it.

    Ok, stupid question. If I put this file in my ROM folder on my Pi, will it work or is the paths all messed up since I ran it on my mac?

    #110041
    sselph
    Participant

    Hmm the gamelist should be in the same directory where you ran the script was run. I’ve heard some other complaints about this so maybe something has changed.

    Anyway if you ran the script from inside a folder with a bunch of roms and didn’t change any of the flags, all the paths should be correct just put the gamelist in the rom folder along with all the roms and the images folder.

    #114810
    proxycell
    Participant

    Hey Steven,
    Long time since I last used your scraper

    I hope this thread is the one to be used for such things:

    How would I go about ADDING to this database? I have every fan-translated game there is and I would love for them to be scraped as the original game

Viewing 35 posts - 1 through 35 (of 59 total)
  • The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.