September 22, 2019 News Magazine

Hatebase catalogues the world’s hate speech in real time so you don’t have to

Policing hate speech is something nearly every online communication platform struggles with. Because to police it, you must detect it; and to detect it, you must understand it. Hatebase is a company that has made understanding hate speech its primary mission, and it provides that understanding as a service — an increasingly valuable one.

Essentially Hatebase analyzes language use on the web, structures and contextualizes the resulting data, and sells (or provides) the resulting database to companies and researchers that don’t have the expertise to do this themselves.

The Canadian company, a small but growing operation, emerged out of research at the Sentinel Project into predicting and preventing atrocities based on analyzing the language used in a conflict-ridden region.

“What Sentinel discovered was that hate speech tends to precede escalation of these conflicts,” explained Timothy Quinn, founder and CEO of Hatebase. “I partnered with them to build Hatebase as a pilot project — basically a lexicon of multilingual hate speech. What surprised us was that a lot of other NGOs [non-governmental organizations] started using our data for the same purpose. Then we started getting a lot of commercial entities using our data. So last year we decided to spin it out as a startup.”

You might be thinking, “what’s so hard about detecting a handful ethnic slurs and hateful phrases?” And sure, anyone can tell you (perhaps reluctantly) the most common slurs and offensive things to say — in their language… that they know of. There’s much more to hate speech than just a couple ugly words. It’s an entire genre of slang, and the slang of a single language would fill a dictionary. What about the slang of all languages?

A shifting lexicon

As Victor Hugo pointed out in Les Miserables, slang (or “argot” in French) is the most mutable part of any language. These words can be “solitary, barbarous, sometimes hideous words… Argot, being the idiom of corruption, is easily corrupted. Moreover, as it always seeks disguise so soon as it perceives it is understood, it transforms itself.”

Facebook is finally banning white supremacy that goes by other names

Not only is slang and hate speech voluminous, but it is ever-shifting. So the task of cataloguing it is a continuous one.

Hatebase uses a combination of human and automated processes to scrape the public web for uses of hate-related terms. “We go out to a bunch of sources — the biggest, as you might imagine, is Twitter — and we pull it all in and turn it over to Hatebrain. It’s a natural language program that goes through the post and returns true, false, or unknown.”

True means it’s pretty sure it’s hate speech — as you can imagine, there are plenty of examples of this. False means no, of course. And unknown means it can’t be sure; perhaps it’s sarcasm, or academic chatter about a phrase, or someone using a word who belongs to the group and is attempting to reclaim it or rebuke others who use it. Those are the values that go out via the API, and users can choose to look up more information or context in the larger database, including location, frequency, level of offensiveness, and so on. With that kind of data you can understand global trends, correlate activity with other events, or simply keep abreast of the fast-moving world of ethnic slurs.

hatebase map

Hate speech being flagged all around the world — these were a handful detected today, along with the latitude and longitude of the IP they came from.

Quinn doesn’t pretend the process is magical or perfect, though. “There are very few 100 percents coming out of Hatebrain,” he explained. “It varies a little from the machine learning approach others use. ML is great when you have an unambiguous training set, but with human speech, and hate speech, which can be so nuanced, that’s when you get bias floating in. We just don’t have a massive corpus of hate speech, because no one can agree on what hate speech is.”

That’s part of the problem faced by companies like Google, Twitter, and Facebook — you can’t automate what can’t be automatically understood.

‘Behind the Screen’ illuminates the invisible, indispensable content moderation industry

Fortunately Hatebrain also employs human intelligence, in the form of a corps of volunteers and partners who authenticate, adjudicate, and aggregate the more ambiguous data points.

“We have a bunch of NGOs that partner with us in linguistically diverse regions around the world, and we just launched our ‘citizen linguists’ program, which is a volunteer arm of our company, and they’re constantly updating and approving and cleaning up definitions,” Quinn said. “We place a high degree of authenticity on the data they provide us.”

That local perspective can be crucial for understanding the context of a word. He gave the example of a word in Nigeria, which when used between members of one group means friend, but when used by that group to refer to someone else means uneducated. It’s unlikely anyone but a Nigerian would be able to tell you that. Currently Hatebase covers 95 languages in 200 countries, and they’re adding to that all the time.

Furthermore there are “intensifiers,” words or phrases that are not offensive on their own but serve to indicate whether someone is emphasizing the slur or phrase. Other factors enter into it too, some of which a natural language engine may not be able to recognize because it has so little data concerning them. So in addition to keeping definitions up to date, the team is also constantly working on improving the parameters used to categorize speech Hatebrain encounters.

Building a better database for science and profit

The system just ingested its millionth hate speech sighting (out of perhaps tens times that many phrases evaluated), which sounds simultaneously like a lot and a little. It’s a little because the volume of speech on the internet is so vast that one rather expects even the tiny proportion of it constituting hate speech to add up to millions and millions.

But it’s a lot because no one else has put together a database of this size and quality. A vetted, million-data-point set of words and phrases classified as hate speech or not hate speech is a valuable commodity all on its own. That’s why Hatebase provides it for free to researchers and institutions using it for humanitarian or scientific purposes.

hatebase how

But companies and larger organizations looking to outsource hate speech detection for moderation purposes pay a license fee, which keeps the lights on and allows the free tier to exist.

“We’ve got, I think, four of the world’s ten largest social networks pulling our data. We’ve got the UN pulling data, NGOs, the hyper local ones working in conflict areas. We’ve been pulling data for the LAPD for the last couple years. And we’re increasingly talking to government departments,” Quinn said.

They have a number of commercial clients, many of which are under NDA, Quinn noted, but the most recent to join up did so publicly, and that’s TikTok. As you can imagine, a popular platform like that has a great need for quick, accurate moderation.

In fact it’s something of a crisis, since there are laws coming into play that penalize companies enormous amounts if they don’t promptly remove offending content. That kind of threat really loosens the purse strings; If a fine could be in the tens of millions of dollars, paying a significant fraction of that for a service like Hatebase’s is a good investment.

“These big online ecosystems need to get this stuff off their platforms, and they need to automate a certain percentage of their content moderation,” Quinn said. “We don’t ever think we’ll be able to get rid of human moderation, that’s a ridiculous and unachievable goal; What we want to do is help automation that’s already in place. It’s increasingly unrealistic that every online community under the sun is going to build up their own massive database of multilingual hate speech, their own AI. The same way companies don’t have their own mail server any more, they use Gmail, or they don’t have server rooms, they use AWS — that’s our model, we call ourselves hate speech as a service. About half of us love that term, half don’t, but that really is our model.”

Hatebase’s commercial clients have made the company profitable from day one, but they’re “not rolling in cash by any means.”

“We were nonprofit until we spun out, and we’re not walking away from that, but we wanted to be self-funding,” Quinn said. Relying on the kindness of rich strangers is no way to stay in business, after all. The company is hiring and investing in its infrastructure, but Quinn indicated that they’re not looking to juice growth or anything — just make sure the jobs that need doing have someone to do them.

In the meantime it seems clear to Quinn and everyone else that this kind of information has real value, though it’s rarely simple.

“It’s a really, it’s a really complicated problem. We always grapple with it, you know, in terms of, well, what role does hate speech play? What role does misinformation play? What role do socioeconomics play?” he said. “There’s a great paper that came out of the University of Warwick, they studied the correlation between hate speech and violence against immigrants in Germany over, I want to say, 2015 to 2017. They graph it out. And its peak for peak, you know, valid for Valley. It’s amazing. We don’t do a hell of a lot of analysis — we’re a data provider.”

“But now have like, almost 300 universities pulling the data, and they do those kinds of those kinds of analyses. So that’s very validating for us.”

You can learn more about Hatebase, join the Citizen Linguists or research partnership, or see recent sightings and updates to the database at the company’s website.

Banner
Related Posts

Spotify, eBay set standard for fertility benefits, study finds

February 4, 2019

February 4, 2019

The technology sector awards women and same-sex couples the most comprehensive fertility benefit packages, according to a survey by FertilityIQ, an online platform for fertility patients to review doctors and research treatments. The company asked 30,000 in vitro fertilisation (IVF) patients across industries about their employers’ — or their spouse’s employer’s’ — 2019 fertility treatment policy, and allocated points based on their support for IVF procedures and egg freezing, among other services. Silicon Valley semiconductor business Analog Devices and eBay led the ranking. The two companies offer employees unlimited IVF cycles with no pre-authorization requirement, meaning employees do not need permission from insurance providers before seeking certain medical services. Pre-authorization has historically impacted lesbian, gay or unpartnered employees from accessing care quickly or at all, FertilityIQ co-founder Jake Anderson explained Spotify, Adobe, Lyft, Facebook and Pinterest ..

Meet your new chief of staff: An AI chatbot

June 20, 2019

June 20, 2019

Years ago, a mobile app for email launched to immediate fanfare. Simply called Mailbox, its life was woefully cut short — we’ll get to that. Today, its founders are back with their second act: An AI-enabled assistant called Navigator meant to help teams work and communicate more efficiently. With the support of $12 million in Series A funding from CRV, #Angels, Designer Fund, SV Angel, Dropbox’s Drew Houston and other angel investors, Aspen, the San Francisco and Seattle-based startup behind Navigator, has quietly been beta testing its tool within 50 organizations across the U.S. “We’ve had teams and research institutes and churches and academic institutions, places that aren’t businesses at all in addition to smaller startups and large four-figure-person organizations using it,” Mailbox and Navigator co-founder and chief executive officer Gentry Underwood tells TechCrunch. “Pretty much anywhere you have meetings, there is value for Navigator.” The life and death of Mailbox Mailbo..

Startups Weekly: Is Munchery the Fyre Festival of startups?

January 26, 2019

January 26, 2019

It was a tough week. Journalists around the U.S. were hit hard by layoffs, from HuffPost to BuzzFeed News to Verizon Media Group, which owns this very site. The government entered day 35 of the shutdown before President Donald Trump agreed to a short-term deal to reopen it for three weeks. And in the startup world, a once high-flying, venture-subsidized food delivery startup crashed and burned, leaving a cluster of small businesses in its wreckage. Some good things happened too — we’ll get to those. Munchery fails to pay its debts In an email to customers on Monday, Munchery announced it would cease operations, effective immediately. It, however, failed to notify any of its vendors, small businesses in San Francisco that had supplied baked goods to the startup for years. I talked to several of those business owners about what they’re owed and what the sudden disappearance of Munchery means for them. #Theranos #Content If you haven’t read John Carreyrou’s “Bad Blood,” stop reading t..

AI photo startup Polarr raises an $11.5 million Series A

March 14, 2019

March 14, 2019

Bay Area photography startup Polarr announced this morning that it has raised an $11.5 million Series A. The new round of funding, led by Threshold Ventures with participation from Pear Ventures and Cota Capital, brings the startup’s total funding to around $12.5 million, according to the latest Crunchbase figures. At the moment, the company is probably best known for its photography app for iOS and Android, which utilizes machine learning and AI to improve image editing. The company says it has around four million monthly active users. This round of funding will go toward research and development, engineering and partnerships, the latter of which are starting to become a big business for Polarr. In fact, it’s using the news to highlight the fact that it was tapped to bring its technology to the Samsung Galaxy S10’s native camera app. Polarr has previously teamed with other big hardware names, including Qualcomm and Oppo. “As deep learning compute shifts from the cloud to edge de..

Pod Foods gets VC backing to reinvent grocery distribution

July 3, 2019

July 3, 2019

Larissa Russell and Fiona Lee founded a cookie startup called Green Pea Cookie in 2014. The cookies were 100% natural, vegan and “handcrafted with love.” The company failed but not because the cookies weren’t selling. The business couldn’t keep up with the antiquated wholesale food distribution system’s steep costs. Two incumbent players, United Natural Foods Inc. and KeHE Distributors, essentially controlled its only pathway to grocery stores across the country. So the founders shut down Green Pea and focused their efforts on building the tool Green Pea had needed to survive: Pod Foods, a distribution and logistics platform for emerging food brands. “We were like so many other young entrepreneurs,” Russell, Pod Foods’ chief executive officer, tells TechCrunch. “I had studied government and economics and did the cookie company because I wanted to create something better for the world but we realized there was a much bigger issue at hand and it wasn’t enough to solve for the end produ..

ThoughtSpot hauls in $248M Series E on $1.95B valuation

August 28, 2019

August 28, 2019

ThoughtSpot was started by a bunch of ex-Googlers looking to bring the power of search to data. Seven years later the company is growing fast, sporting a fat valuation of almost $2 billion and looking ahead to a possible IPO. Today it announced a hefty $248 million Series E round as it continues on its journey. Investors include Silver Lake Waterman, Silver Lake’s late-stage growth capital fund, along with existing investors Lightspeed Venture Partners, Sapphire Ventures and Geodesic Capital. Today’s funding brings the total raised to $554 million, according to the company. The company wants to help customers bring speed to data analysis by answering natural language questions about the data without having to understand how to formulate a SQL query. As a person enters questions, ThoughSpot translates that question into SQL, then displays a chart with data related to the question, all almost instantly (at least in the demo). It doesn’t stop there though. It also uses artificial intel..

GirlGaze Network looks to connect brands with female creatives

June 20, 2019

June 20, 2019

It started with a hashtag. Amande de Cadenet, photographer, author, and TV host, was spending time with her sister, a director and photographer in her own right, when an ACLU study on the lack of diversity among directors was published in the NYT Magazine, with de Cadenet’s sister an interviewee in the cover story. “It’s about damn time,” she said to her sister, launching a conversation that would re-route de Cadenet’s path forward. Her experience as a photographer herself, able to book editorial jobs but rarely getting paid gigs, cemented what she had just read in the magazine article. “The glass ceiling was so low that I couldn’t get off my knees,” she explained of that time. Over the next 48 hours she would design a logo and a font and contact everyone in her creative network, brands and artists alike, to answer the call when she tweeted a call to action. She simply asked for female photographers and videographers to share their photos alongside the hashtag #girlgaze. “The major..

8-month-old startup FPL Technologies raises $4.5M to improve credit card experience in India

September 5, 2019

September 5, 2019

An eight-month-old startup in India that wants to improve the user experience of credit card holders in the nation has received the backing of at least two major investors. Pune-based FPL Technologies said Thursday it has raised $4.5 million in its maiden financing round from Matrix Partners India, Sequoia Capital India, and others. In an interview with TechCrunch earlier this week, Anurag Sinha, co-founder and CEO of FPL Technologies, said the startup aims to build a full stack solution to reimagine how people in India get their first credit card and engage with it. Even as hundreds of millions of people in India today are securing loans from organized financial lenders, most of them are unable to get a credit card. Fewer than 25 million people in the country today have a credit card, according to industry estimates. And even those who have a credit card are not exactly pleased with the experience. Vibhav, Anurag, Rupesh, co-founders of FPL Technologies, pose for a picture Much o..

Lilium unveils five-seater air taxi prototype after a successful maiden flight for its latest jet

May 16, 2019

May 16, 2019

Lilium, the Munich-based startup developing an on-demand “air taxi” service, has unveiled a new five-seater prototype and is announcing to the world that a maiden flight for the new device was successfully completed earlier this month. It’s not the first time a Lilium Jet — the company’s all-electric vertical take-off and landing (VTOL) device — has taken to the sky but it is the first time the new five seater has taken off and landed, following extensive ground testing. Lilium published a video of a two-seater version’s inaugural flight just over two year’s ago. The new five-seater is a full-scale, full-weight prototype that is powered by 36 all-electric jet engines to allow it to take-off and land vertically, while achieving “remarkably efficient horizontal or cruise flight,” says Lilium In a call, Lilium co-founder and CEO Daniel Wiegand described the test flight, which was a little behind schedule, as a huge step towards making urban air mobility a reality. The new jet performed..

Zero raises $20 million from NEA and others for a credit card that works like debit

May 23, 2019

May 23, 2019

Just ahead of the launch of the Apple Card, a startup that has its own take on modernizing the credit card industry, Zero, is announcing the close of its $20 million Series A. The new round of funding was led by New Enterprise Associates (NEA), and brings Zero’s total raised to date to $35 million, including both equity and debt funding. Other investors in the round include SignalFire, Eniac Ventures, Nyca Partners, and some unnamed school endowments. Zero had previously announced an $8.5 million raise in fall 2017, led by Eniac, and had raised $7 million in venture debt from Silicon Valley Bank. Zero has a clever idea that targets millennials’ hesitance to sign up for credit cards. Today, only 33 percent of millennials have a major credit card, a Bankrate survey found — largely because they’re wary of falling into the vicious debt cycle. Instead, this younger demographic often only carries a debit card. But that also means they’re missing out on credit card benefits — like points, ..

India’s 9-month-old CRED raises $120M to help people improve their financial behavior

August 26, 2019

August 26, 2019

Many Silicon Valley companies and fintech startups in India today share a common mission: They all want to bring their financial services to the next billion users. Dozens of fintech startups that we have spoken to in recent months have told us that they all want to address much of India, one of the last great growth markets globally, in the next few years. So you can imagine our excitement when we learned there is at least one startup that is going after just a few million users in the immediate future. We’re talking about CRED, a nine-month-old, Bangalore-based startup that is building solutions to incentivize credit card users in India to become more responsible with money and thereby improve their credit score. CRED has raised $120 million in a Series B financing round, Kunal Shah, founder and CEO of the startup, told TechCrunch on Monday. He declined to share more information. The startup, which has raised about $145 million to date, is now valued between $430 million to $450 mi..

Compete in the TC Hackathon at Disrupt Berlin 2019

August 29, 2019

August 29, 2019

Indulge us as we paraphrase the song, Anything You Can Do from the movie, “Annie Get Your Gun.” Anything you can code, I can code better. I can code anything better than you. If that describes your skills and attitude, it’s time — as we say in the States — to put up or shut up. We’re calling all code slingers to take part in the TC Hackathon at Disrupt Berlin 2019. We’re limiting participation to 500 people, so don’t wait. Apply here today. Oh, and it doesn’t cost anything to apply or to participate. In fact, we give you a free Innovator pass to attend the show. What’s at stake? Along with your reputation, you have a shot at winning a $5,000 prize from TechCrunch for the best overall hack. Plus, you and your team (either the one you bring or the one you find onsite) will choose one of several sponsored challenges — each one offering its own cash and prizes. We’ll make an official announcement about specific sponsors and contests, but last year’s sponsored contests, prizes and winne..

Twitter leads $100M round in top Indian regional social media platform ShareChat

August 16, 2019

August 16, 2019

Is there room for another social media platform? ShareChat, a four-year-old social network in India that serves tens of million of people in regional languages, just answered that question with a $100 million financing round led by global giant Twitter . Other than Twitter, TrustBridge Partners, and existing investors Shunwei Capital, Lightspeed Venture Partners, SAIF Capital, India Quotient and Morningside Venture Capital also participated in the Series D round of ShareChat. The new round, which pushes ShareChat’s all-time raise to $224 million, valued the firm at about $650 million, a person familiar with the matter told TechCrunch. ShareChat declined to comment on the valuation. Screenshot of Sharechat home page on web “Twitter and ShareChat are aligned on the broader purpose of serving the public conversation, helping the world learn faster and solve common challenges. This investment will help ShareChat grow and provide the company’s management team access to Twitter’s executi..

Peloton files publicly for IPO

August 27, 2019

August 27, 2019

Peloton, the well-funded maker of internet-connected stationary bikes and treadmills, has finally revealed documents for its upcoming initial public offering. The business previously submitted a confidential draft submission of its S-1 statement to the U.S. Securities and Exchange Commission in June. The New York-based company, which plans to raise $500 million in its Nasdaq offering, will trade under the ticker symbol PTON. Peloton reported $915 million in total revenue for the year ending June 30, 2019, an increase of 110% from $435 million in fiscal 2018 and $218.6 million in 2017. Its losses, meanwhile, hit $245.7 million in 2019, up significantly from a reported net loss of $47.9 million last year. The company has reached 1.4 million total community members, defined as any individual who has a Peloton account. Peloton customers subscribe to the company’s digital library of fitness content, streamed live and on-demand, for $39 per month, in addition to purchasing its hardware, ..

Atlassian acquires AgileCraft for $166M

March 18, 2019

March 18, 2019

Atlassian today announced that it has acquired AgileCraft, a service that aims to give enterprises plan their strategic projects and workstreams. The service provides business leaders with additional insights into the current status of technical projects and gives them insights into the bottlenecks, risks and dependencies of these projects. Indeed, the focus of AgileCraft is less on technical teams than on the business teams that support them and help them manage the digital transformation of their businesses. The price total of the acquisition is about $166 million, with $154 million in cash and the remainder in restricted shares. “Many leaders are still making mission-critical decisions using their instincts and best guesses instead of data,” said Scott Farquhar, Atlassian’s co-founder and co-CEO, in today’s announcement. “As Atlassian tools spread through organizations, technology leaders need better visibility into work performed by their teams. With AgileCraft joining Atlassian,..

Comments
Leave a Reply

Your email address will not be published. Required fields are marked *