黑帽SEO
免费发布泛目录 蜘蛛池 黑帽SEO工具

全国高校URL

  寒假准备挖edu src

  脚本未完善,还请师傅们指点

  import requests

  import re

  def get_school():

  links=[]

  url=’http://www.hao123.com/edu’

  response=requests.get(url)

  edu_links=re.findall(r’href=http://www.xisewbms.cn/(.*?) target=’,response.text)

  for k in edu_links:

  if ‘eduhtm’ in k:

  links.append(k)

  for number,l in zip(range(9999),links):

  links[number]=l.strip(‘”‘)

  number +=1

  for k in range(2):

  del links[-1]

  return links

  def get_all_url(urls):

  all_links=[]

  for url in 全国高校URL urls:

  res=requests.get(url)

  res.encoding=’gb2312′

  rule=re.findall(r”

  

“, res.text, re.I | re.S | re.M)

  for k in rule:

  all_links.append(k)

  return all_links

  if __name__==’__main__’:

  baidu=[]

  link=get_school()

  all=get_all_url(link)

  for number,url in zip(range(1,9999),all):

  if ‘baike’ in url:

  baidu.append(url)

  else:

  try:

  urls=re.findall(r'”(.*?)”‘,url)

  school=re.findall(r'”>(.*?)‘,url)

  print(str(number)+”-“*18+school[0] + “:” + urls[0])

  except:

  pass

  print(“还有{}链接未爬取”.format(len(baidu)))

  print(“-“*30+”以下未爬取”+”-“*30)

  for k,j in zip(baidu,range(1,9999)):

  name=re.findall(r'”>(.*?)‘, k)

  print(str(j)+”-“*18+name[0])%

未经允许不得转载:黑帽SEO-实战SEO技术培训、泛目录站群、蜘蛛池、流量技术教程 » 全国高校URL
分享到: 更多 (0)

黑帽SEO-实战SEO技术培训、泛目录站群、蜘蛛池、流量技术教程

不做韭菜坚决不做韭菜