100% this. Every website is different, though after doing this kind of thing for long enough, there are often common patterns and frameworks/libraries. Even general obfuscation can be reasonably reverse engineered with enough time and effort.
100% this. Every website is different, though after doing this kind of thing for long enough, there are often common patterns and frameworks/libraries. Even general obfuscation can be reasonably reverse engineered with enough time and effort.
I agree that OP sounds like a beginner, and what you’ve suggested is likely the best approach for someone who is familiar with frontend tools and frameworks. Selenium (and admittedly BeautifulSoup) is probably too low level for this particular user, but that doesn’t mean they can’t still learn some fundamentals while solving this problem without resorting to something as heavy and complicated as background browser emulation and rendering. I could be wrong though.
I’m not currently on Discord, could you upload the code to pastebin or something similar?
I would love to see your code, but I understand if this forum isn’t the most ideal place to share.
In my experience, this scenario typically means that there is some sort of API (very likely undocumented) that is being used on the backend. That requires a bit more investigation and testing with browser developer tools, the JS Console, and often trial and error. But once you overcome that (admittedly very complex and technical) hurdle, you can almost always get away with just using the requests library at that point.
I’ve had to do that kind of thing more times than I’d like to admit, but the juice is almost always worth the squeeze.
Selenium is really more of a testing framework for frontend developers, and could theoretically be used for scraping, but that would be somewhat like buying a car based on the paint and not looking in detail under the hood.
I can’t say I’ve ever worked with scrappy, but the tool I would use for web scraping with Python is BeautifulSoup. This tutorial seems decent enough, but you will need to understand basic web concepts like IDs, classes, tags, and tag attributes to get the most out of the tutorial: https://geekpython.medium.com/web-scraping-in-python-using-beautifulsoup-3207c038723b
W3Schools will also be your friend if you have questions about HTML/CSS selectors in general: https://www.w3schools.com/html/default.asp
Understanding regular expressions and/or xpath would also be very helpful, but are probably best considered to be extra credit in most cases.
I’ll try to respond if you have any issues or questions, but hopefully that gives you enough to get started.
I feel like this question is too vague to be answered with any substance. Where do I draw the line in what context? Technology? Dating? Politics? Family? Social media? Food? Etc.
I draw the line at answering unanswerable questions.