Check-in [1dcb8f6e6b]
Overview
Comment: | update selectors for extraction |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
1dcb8f6e6b07c8bedf26cc545a72819c |
User & Date: | mario on 2022-02-16 08:54:10 |
Other Links: | manifest | tags |
Context
2022-02-16
| ||
21:08 | remove docblock, reinstate max_streams for loop. check-in: e9dc9616e1 user: mario tags: trunk | |
08:54 | update selectors for extraction check-in: 1dcb8f6e6b user: mario tags: trunk | |
08:20 | updated key mapping check-in: 01a94c1fb6 user: mario tags: trunk | |
Changes
Modified contrib/housemixes.py from [60f4ff2e4d] to [b4dd575f14].
1 2 3 4 5 6 | # encoding: UTF-8 # api: streamtuner2 # title: house-mixes # description: UK DJs house/techno mixes # type: channel # category: collection | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # encoding: UTF-8 # api: streamtuner2 # title: house-mixes # description: UK DJs house/techno mixes # type: channel # category: collection # version: 0.7 # url: http://www.house-mixes.com/ # config: # ( -x-off-name: housemixes_pages, type: int, value: 5, description: maximum number of pages to scan ) # priority: contrib # png: # iVBORw0KGgoAAAANSUhEUgAAABQAAAASBAMAAACp/uMjAAAAGFBMVEUIBQE7KCGSPABUWFrwcACIkJayuLr3+vjaVnR4AAAAAWJLR0QAiAUdSAAAAAlwSFlzAAALEwAACxMB # AJqcGAAAAAd0SU1FB+AKCQwWONCoEiQAAACPSURBVAjXLY29DsIgFIUPxehaExJXW+EB7MJaCE/gAKvRaNdWYu/rewve6bsn5wcARN+3KNeE7hgq60t8X30Rx4mIhjOj2tMr |
︙ | ︙ | |||
76 77 78 79 80 81 82 | if not cat in self.catmap: return # collect self.status(0.0) html = ahttp.get(self.base_url + self.catmap[cat]) max = int(conf.max_streams) / 50 # or enable conf.housemixes_pages? | | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | > > | | | | | | | | | | 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | if not cat in self.catmap: return # collect self.status(0.0) html = ahttp.get(self.base_url + self.catmap[cat]) max = int(conf.max_streams) / 50 # or enable conf.housemixes_pages? for i in range(2, 3):#int(max)): self.status(float(i) / max) if html.find("latest/" + str(i)): html = html + ahttp.get(self.base_url + self.catmap[cat] + "/latest/%s" % i) html = re.sub("</body>.+?<body>", "", html, 100, re.S) self.status("Extracting mixesβ¦") """ <div class="mix-item "> <div class="mix-item-inner"> <div class="img-container"> <a href="/profile/DJ%20RIKKI/play/different-mix-2"><img src="https://static.house-mixes.com/s3/webmixes-images/accounts-152296/profileMain.jpg?width=360&quality=45&crop=true" data-src="https://static.house-mixes.com/s3/webmixes-images/accounts-152296/profileMain.jpg?width=360&quality=45&crop=true" alt="different mix (2)" class="img-responsive unveil-loaded"></a> </div> <h3><a href="/profile/DJ%20RIKKI/play/different-mix-2" title="different mix (2)"><span>different mix (2)</span></a></h3> <h4>By <span itemtype="http://schema.org/MusicGroup"><a href="/profile/DJ%20RIKKI"><span>DJ RIKKI</span></a></span></h4> <i class="fa fa-time"></i> <span class="fromNow media-time" title="2021-09-21T01:42:08" data-livestamp="2021-09-21T01:42:08">21/09/2021 01:42:08 +01:00</span> <div class="mix-item-genre"> Chicago House </div> <div class="media-stats"> <i class="fa fa-play-circle"></i> <a href="/stats/dj%20rikki/different-mix-2/plays/1"><span class="media-plays" data-rel="tooltip" title="" data-original-title="Total Plays: 141 - Weekly Plays: 141">141</span></a> <i class="fa fa-download"></i> <a href="/stats/dj%20rikki/different-mix-2/downloads/1"><span class="media-downloads" data-rel="tooltip" title="" data-original-title="Total Downloads: 2 - Weekly Downloads: 9">2</span></a> <i class="fa fa-thumbs-up"></i> <a href="/stats/dj%20rikki/different-mix-2/likes/1"><span class="media-likes" data-rel="tooltip" title="" data-original-title="Total Likes: 0 - Weekly Likes: 0">0</span></a> <i class="fa fa-heart"></i> <a href="/stats/dj%20rikki/different-mix-2/favourites/1"><span class="media-favourites" data-rel="tooltip" title="" data-original-title="Total Favourites: 0 - Weekly Favourites: 0">0</span></a> <i class="fa fa-comments-o"></i> <a href="/profile/DJ%20RIKKI/play/different-mix-2"><span class="media-comments" data-rel="tooltip" title="" data-original-title="Total Comments: 0 - Weekly Comments: 0">0</span></a> </div> <!--<div class="media-tags"> <i class="fa fa-folder-close-alt"></i> <span class="media-genre"><a href="/djmixes/chicago-house-dj-mixes" title="View Chicago House Mixes">Chicago House</a></span> <div> <i class="fa fa-tags"></i> <a href="/tags/bad%20boy%20bill">bad boy bill</a> </div> </div>--> </div> </div> """ # extract for card in pq(html).find(".mix-item"): print(card) card = pq(card) r = { "title": card("h3 a").attr("title"), "playing": card("h4 span").text(), "genre": card(".mix-item-genre").text(), # stream url will later be substituted in .row() access "url": self.base_url + (card(".img-container a").attr("href") or ""), "homepage": self.base_url + (card(".img-container a").attr("href") or ""), # standard size 318x318 loads quicker "img": card(".img-container img").attr("src"), # re.sub("=318&", "=32&", ...) "listeners": int(card(".media-plays").text() or 0), "bitrate": sum(int(a or 0) for a in card(".media-likes, .media-downloads, .media-favourites").text().split()), } streams.append(r) #log.DATA( streams ) return streams |
︙ | ︙ |