“Best of Breed” – that is our goal for both our customers and ourselves. To achieve that, we collect a high volume of data from online and offline sources, including websites, blogs, and social networks. We develop a unique “large-scale crawling framework,” which utilizes current big data technology like Map Reduce and Hadoop. That allows us to acquire web data of any order of magnitude – really! – to dive into for analysis and evaluation.
DATAlovers crawling technology is based on the Apache Top Level Project Nutch. It sounds complicated, but it isn’t – at least not for us. With the help of the distributed Map Reduce Architecture it is easy to crawl websites of any size. That is how the data that we want to include in our analysis is acquired. The rest comes afterwards. Typical applications include website crawling for queries from APIs (Facebook, Twitter, YouTube, etc.) and price scraping from shopping portals. Duplicate detection eliminates data sets that are very similar, so-called “near duplicates”. That way nothing – and that means nothing – can distort our results.
The digital revolution is marching forth unchecked. We know this certainly. Additionally, it is accompanied by a restructuring of textual data at an increase of 50 percent per year. That is no small amount. Consequently, the extraction of structured information out of unstructured content is the first and most important step. Information extraction is the sector in text mining in which people will recognize imprinted information or catchwords within texts automatically. The central information, displayed in the overall content of the texts, is fundamental for each additional processing stage – the assembly of texts according to the similar content or the selected features in the graphic interface. Thus we make the encrypted content in the texts visible and usable.
Last but not least, the data and information we gather needs to be prepared as efficiently and intuitively as possible. Only then can we fulfill all of your needs. On the one hand, the data has to be condensed in such a way that you can get all the main bullet points in a single glance; on the other, it’s obviously essential that users can dive deep enough to see stories and patterns. We achieve that by rigorously applying a combination of approaches.
Our Lead Prediction technology uses modern deep learning algorithms to tell you who best to target as potential future customers. We analyze successful customer relationships to build a profile of your generic customer, which is then compared with the entire business universe. That way we can be certain of recommending the potential new customers most likely to be converted. It’s as close as you can get to knowing the future.
Trend detection – using Twitter, news outlets, and market research data – keeps a finger on the pulse of the data streams. We are constantly asking ourselves: has something changed? Is there a new topic, something trending that I should know about? We adapt indicators from the world of the stock market to recognize trend shifts in our time series.
And what do we do with all this analysis? We visualize it clearly in plots and graphs, segment search results effectively, and, brick by brick, work to make the real depth of data available. And then suddenly it’s all there – all the knowledge you need to make your daily workflow possible.