12
Jappe
8y

!Rant

Wrote a crawler and now has 18 million records in the queue. About 500.000 files with metadata.
1 month until deadline and we have to do shit many things.

Now we discover we have a flaw in our crawler ( I don't see it as a bug ).. We don't know how much metadata we missed, but now we have to write a script that scrapes every webpage that we've already visited and get that metadata..

What's the flaw you ask? Some people find it funny to put capital letters in their attribute names.. *kuch* Microsoft.com!! *kuch*

And what didn't we do? We didn't lower case each entire webpage and then, only then, search the webpage for data..

Comments
  • 11
    Never assume casing, it will find a way to bite you :/
  • 1
    Yeah, lesson learned. Curious, what are you building?
  • 1
    @Jumpshot44 A webcrawler
  • 0
    Always assume somewhere along the way you're going to have to deal with an idiots input.

    Oh and just so you know, constantly repeating the mantra "garbage in, is garbage out" directly to said idiot only seems to lead to worse input.
  • 0
Add Comment