Implementation of search robot's function to collect information in scientometric systems

Anis F. Galimyanov

Resumen

At present, the World Wide Web is developing rapidly, and every day the problem of automated collection and analysis of information placed on various web resources is becoming increasingly urgent. If in the 90s of the last century, the World Wide Web was a huge amount of poorly structured information, to search in which it was difficult for a person. It was then that the first developments in the field of automated agents began to appear, facilitating the task of finding the necessary information on the web. The main part of such systems is a search robot - a software package that navigates through web resources and collects information for a database. In the Kazan (Volga Region) Federal University, a monthly rating of academic staff is compiled based on data placed in the personal offices of employees in the Electronic University system. Now there is a need to move away from manually filling the Hirsch index in a personal account with KFU staff to avoid incorrect data filing and validation of the entered information by the Prospective Development Center. What was required was the creation of a search robot to automatically collect the Hirsch indices of KFU employees from the Scopus system. This article discusses the search robot: What is it? How does he work? How to write your program to collect information? All these issues were addressed in this article. The possible types of search robots and the whole process of their work were considered. The Scopus scientometric system and scientometric indicator - Hirsch index, its purpose, and calculation were considered. For implementation, the Python programming language was used and the tools for implementing HTTP requests and processing HTML pages were considered.

Palabras clave

search robot, spider, crawler, bot, parser, crawler, bot, robot, spider, Hirsch index, Scopus, pytho

Texto completo:

PDF (English)

Referencias

Web Scraping with Python. Ryan Mitchell, 2015 https://yanfei.site/docs/dpsa/references/PyWebScrapingBook.pdf

Official documentation for Python library Beautiful Soup. https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Requests Documentation Release 2.21.0. Kenneth Reitz, 2019 https://buildmedia.readthedocs.org/media/pdf/requests/master/requests.pdf

Scopus. 2018 https://ru.wikipedia.org/wiki/Scopus

Methods of search in the Scopus database. Dudnikova O.V., Bondarenko S.A., 2011 https://library.sfedu.ru/media/upload

/%20Материалы%20ДПО%20/Учебно-методическое%20пособие_Scopus2.pdf

Search robots. Markova T.I., Zakharova K.V. 2009 https://cyberleninka.ru/article/v/poiskovye-roboty

Adaptive crawler for searching and collecting external hyperlinks A.A. Pechnikov, D.I. Chernobrovkin. 2012 https://cyberleninka.ru/article/v/adaptivnyy-krauler-dlya-poiska-i-sbora-vneshnih-giperssylok

Hirsch Index. https://ru.wikipedia.org/wiki/Indeks_Hirsha

What is the Hirsch index and how to raise it? Alex Zvansky, 2017 https://wos-scopus.com/chto-takoe-indeks-hirsha/

HTTP response codes. 2019 https://developer.mozilla.org/ru/docs/Web/HTTP/Status

Search robot. https://ru.wikipedia.org/wiki/Search_robot

The search robot is what it is and how it works. http://seo-dnevnik.ru/blogosfera/poiskovyiy-robot-robotyi-poiskovyih-sistem.html

Bot (program). https://ru.wikipedia.org/wiki/bot_ (program)

What is a search robot? https://wiki.rookee.ru/poiskovyj-robot/

HTML. 2019, https://ru.wikipedia.org/wiki/HTML

Search robots. 2010, http://wiki.webimho.ru/search exploit

Search engine robots. 2006, https://www.seonews.ru/masterclasses/robotyi-poiskovyih-sistem/

Jahwari, N. A., & Khan, M. F. (2016). ORGANIZATIONAL LEARNING MECHANISMS IN SOHAR UNIVERSITY. Humanities & Social Sciences Reviews, 4(2), 76-87. https://doi.org/10.18510/hssr.2016.423

Shirvani, M., Mohammadi, A., & Shirvani, F. (2015). Comparative study of cultural and social factors affecting urban and rural women's Burnout in Shahrekord Township. UCT Journal of Management and Accounting Studies, 3(1), 1-4.

Enlaces refback

  • No hay ningún enlace refback.

Comentarios sobre este artículo

Ver todos los comentarios