Have you ever come across scenarios wherein links within your websites have apparently broken after an upgrade or some of the links have stopped working suddenly and are returning 404 – page not found errors? Well, there is an easy way out rather than manually creating redirects! Researchers claim that they have developed an algorithm that can fix 90 percent of all broken links on a webserver if the resources are still present.
Mohammad Pourzaferani and Mohammad Ali Nematbakhsh of the University of Isfahan have developed a new method of fixing broken links, which they claim has the ability to trap missing links on a webserver thereby identifying the resource that has been detached.
The duo revealed that there have been solutions that address this broken links and detached data issues however, they have an inherent issue as they target a single point of failure and they rely on knowledge of destination data source. These methods don’t take into consideration wider database issues that may have plagued the server.
The team revealed that their method of fixing the broken links by concentrating on source point of links and have developed a way to discover the new address of the digital entity that has become detached.
“The proposed algorithm uses the fact that entities preserve their structure event after movement to another location. Therefore, the algorithm creates an exclusive graph structure for each entity,” said Pourzaferani.
“When the broken link is detected the algorithm starts its task to find the new location for detached entity or the best similar candidate for it.
“To this end, the crawler controller module searches for the superiors of each entity in the inferior dataset, and vice versa. After some steps the search space is narrowed and the best candidate is chosen,” said Pourzaferani.
In their sample data set involving details of 300,000 persons, their algorithm identified over 5000 entities with changes in before and after snapshots and managed to fix 9 out of 10 broken links.