Mentions

Mention API resources

Mention resource

Field

Format

More information

author_id

string

See Author and source resources.

channel

enum

classifiers

array[object]

See Classifier resources

classifier_tags

array[object]

See Classifier resources

collaboration_status

integer

Resource ID.

collaboration_user

integer

Resource ID.

content

string

Mention’s content. Empty for tweets.

copyright

string

Copyright related to license data. Only present for some licenses.

crawled_at

RFC3339

Date mention crawled by Synthesio. Not available for all mentions.

date

RFC3339

Mention’s date

extra_properties

object

See Extra Properties resource.

id

string

Synthesio mention’s internal identifier.

infused_at

RFC3339

Insert into report date

language

ISO639-3

See Language resource.

license

array[string]

Provider of the mention has specific license restrictions.

location

object

See Location resource.

meta

object

See Mention API meta resource.

native_id

string

Resource ID on original social network (ex: tweet ID).

parent_id

string

For comments, ID of the mention this mention answer to.

review

object

See Review resource.

sentiment

enum

See Sentiment resource.

site_id

string

See Site resource.

source_id

string

See Author & source resources.

synthesio_rank

float

Synthesio note given to the mention between 0 and 10.

tags

array[object]

Can be an empty array. See tag resource.

title

string

Mention’s title. For tweets, contains all the data.

twitter

object

See Twitter data resource.

type

enum

post or comment

url

string

Mention’s URL.

Location resource

Field

Format

city

string

country

ISO3166-1 alpha-3

See country resource.

county

string

latitude

string

longitude

string

state

string

Tag resource

Field

Format

Description

name

string

Tag name.

sentiment

string

See sentiment resource.

Tag name should match existing topics in the corresponding report setup configuration.

Review resource

Field

Format

More information

date

RFC3339

Last manual change done on mention.

status

enum

See Human review status resource.

Site resource

Field

Format

More information

name

string

Site name

url

string

Base url.

site_type_id

integer

Categorization. See Medias filter

Twitter data resource

This object is only present for Twitter site mentions. It will have a null value if tweet is neither a retweet nor a quoted retweet.

Field

Format

More information

type

enum

Values: retweet or quoted_retweet.

Author and source resources

Field

Format

native_id

string

User ID on social network (Twitter, Facebook…)

id

string

Synthesio ID

url

string

picture_url

string

URL without Removed ‘http:’ or ‘https:’.

type

enum

Values: user, page or group.

location

object

See Location resource.

full_name

string

name

string

username

string

metric_list

object

Metrics like number of followers. See Metric list

demographics

object

Authors only. See Demographics resource

extra_properties

object

Mention API Meta resource

Meta object is a JSON object which only contain properties useful for the current mentions. Empty fields are omitted in the payload.

Field

Format

More information

attachment_list

string

List of attachments with type, url, and optional ID.

emoji_list

array[string]

List of emoji extracted from mention content.

hashtag_list

array[string]

List of hashtags (#) extracted from mention content.

token_list

array[string]

List of tokens from mention text.

url_list

array[string]

List of URLs extracted from mention content.

Extra Properties resource

Extra properties object is a JSON object which only contain properties useful for the current mentions. Empty fields are omitted in the payload.

Field

Format

More information

restricted_usage

boolean

When true: indicates copyright restrictions.

gnipComplianceStatus

string

See below.

highlights

object

Contains excerpts and highlighted mentions. See below.

GNIP Compliance Status

Only applies to tweets.

  • In a mention extra_properties object, gnipComplianceStatus value can be “deleted”. This means the owner of the tweet has deleted it.

  • In an author extra_properties object, gnipComplianceStatus value can be “deleted” or “protected”. This means either the twitter account has been deleted or that the author has now protected his tweets.

Conclusion: all mentions fields will be removed and set to null or empty. These mention should be discarded. In a later version we will provide a way to directly exclude those from the results.

Highlighted mention text

Field

Format

More information

title

string

Title with highlighted elements.

content

string

Content with highlighted elements.

excerpt

string

Text extract with highlited elements.

Any of the fields can be missing:

  • if mention has no content element (ex: twitter only has a title element).

  • for report with all content configuration, mention does not have to match any topic.

Demographics resource

Demographics resource is an object with various values computed by analysing the data corpus. Demographics are only available for twitter authors.

Example of demographics resource, extracted from a mention’s author object.

{
    "demographics": {
        "affinity_list": [
            "Sports",
            "Travel & Tourism",
            "Fashion & Beauty"
        ],
        "age": "18-29years",
        "bio_tag_list": [
            "voyage",
            "twitter",
            "sport",
            "mode",
            "quotidien"
        ],
        "family_status": "unknown",
        "gender": "female",
        "job_list": [
            "Bloggers, Journalists & Authors"
        ],
        "language_list": [
            "eng"
        ],
        "marital_status": "unknown"
    }
}

Demographics: affinities

3 available languages : English, French, Spanish.

The system is based on the keyword lists and will detect one or more affinities by user. The keyword list is created by human analysis and aided by machine learning.

Currently there is 34 affinities.

  • Art & Culture

  • Automotive

  • Fashion & Beauty

  • Food & Drink

  • Food & Drink: Beer

  • Food & Drink: Cooking

  • Food & Drink: Tea & Coffee

  • Food & Drink: Wine

  • Graphic Design

  • Health & Fitness

  • Home & Hobbies

  • Indoor Games

  • Literature

  • Movies & Cinema

  • Music

  • Nature & Wildlife

  • News & Current Affairs

  • Nightlife & Parties,

  • Outdoor Action & Adventure

  • Pets

  • Photography

  • Politics & Activism

  • Religion

  • Science

  • Sports

  • Sports: Football

  • Sports: Water sports

  • Sustainable Living

  • TV

  • Technology

  • Technology: Open source

  • Technology: Telecom & Mobility

  • Travel & Tourism

  • Video Games

Format

  • affinity_list (array[string]): One of the above listed values.

Demographics: age detection

5 available languages : English, French, Spanish, Italian, Portuguese.

Uses machine learning system that “guesses” user’s age. It takes into account user’s vocabulary and profil information to give an estimation age.

Value

Description

<18years

Z generation (< 18 years)

18-29year

Y generation (18-29 years)

>=30years

X generation (> 29 years)

unknown

Format

  • age (string): One of the above listed values.

Demographics: family status detection

5 available languages : English, French, Spanish, Italian, Portuguese.

Information is obtained from the bios, with a word match on a curated list that is created by human analysis and machine learning.

Value

Description

Children

unknown

Format

  • family_status (string): One of the above listed values.

Demographics: gender detection

12 available languages : french, english, spanish, german, italian, portuguese, russian, turkish, arabic, japanese, dutch, indonesian.

Based on the first name dictionary for each language, the system will classify users by male or female.

Possible values:

  • male

  • female

  • unknown

format

  • gender (string): one of the above listed values.

Demographics: interests

12 available languages : French, English, Spanish, German, Italian, Portuguese, Russian, Turkish, Arabic, Japanese, Dutch, Indonesian.

Sort of word cloud for Twitter users’ bios. Identification of interests is based analysis of bios: we take into account keywords that provide information on users.

Format

  • bio_tag_list (array[string]): Array of words.

Demographics: job detection

5 available languages : English, French, Spanish, Italian, Portuguese.

Information is obtained from the bios, with a word match on a curated list that is created by human analysis and machine learning.

Currently, there 4 types of job that we detect:

  • Bloggers, journalists & Authors

  • Education & Academics

  • Tech jobs

  • Top Managers

Format

  • jobs (array[string]): One of the above listed values.

Demographics: marital status detection

5 available languages : English, French, Spanish, Italian, Portuguese.

Information is obtained from the bios, with a word match on a curated list that is created by human analysis and machine learning.

Possible values:

  • Single

  • Couple

  • unknown

format

  • marital_status (string): One of the above listed values.

Demographics: user’s language(s) detection

Twitter users can only specify one language in their bios. But in reality, they can tweet in multiple languages. For example, user A marked his default language as English in his profile, but the half of his tweets are in French. Therefore, we need to store all the languages that users used in their tweets and it’s possible to have more than 1 language for each user.

format

  • languages (array[string]): ISO 639-3 languages codes.

Metric list

Metric list is a list of metrics relative to the source. They are site-specific in most cases. Twitter and instagram authors have a “followers” metric. Youtube channels have a “subscribers” metric.

Example of metric list, extracted from a mention’s source object.

{
    "metric_list": [
        {
            "name": "followers",
            "date": "1787-02-31T16:22:54+01:00",
            "value": 649
        }
    ]
}

Classifier resources

Contains the classifiers on the mention. The format is a list of objects, with one object per classifier.

Field

Format

id

string

Classifier ID

name

string

Classifier name

Classifier tags have the exact same format.

Analytics

Compute analytics on requested fields for given reports. This resource needs an OAuth token from Synthesio security API.

Each analytics may produce various results. Each result in a group is called a “bucket”. When requesting analytics on various fields, the first one will be the outer one in the result set.

In the meta information of the result, there is always an hits value that indicate the number of mentions matching the request filters. This value allows to known if an empty result set is due to mentions not having the requested data, requesting a bad analytics or if the filter set has excluded all mentions from computation.

JSON Payload

Base payload for any analytics request:

  • aggs (array[object], required): list of fields to compute analytics. Each object must respect this format:

    • field (string, required): name of field on which to compute analytic.

    • type (string, optional): type of analytics to compute. Default depends on field. See table below.

    • For other fields, see description of available analytics types

  • filters (object, optional)

When requesting analytics on various fields, the last one will be the inner one in the result set.

Example Mention API analytics request, get the number of mentions by sentiment by month.

POST https://rest.synthesio.com/mention/v2/reports/38298/_analytics HTTP/1.1
Host: rest.synthesio.com
Authorization: Bearer YOUR_API_TOKEN
Content-Type: application/json

{
    "aggs": [
        {
            "field": "date",
            "interval": "month"
        },
        {
            "field": "sentiment"
        }
    ],
    "filters": {
        "period": {
            "begin": "2016-03-01T00:00:00Z",
            "end": "2016-05-01T00:00:00Z"
        }
    }
}

Response

{
  "meta": {
    "hits": 40476
  },
  "data": {
    "2016-03": {
      "value": 24004,
      "data": {
        "neutral": {
          "value": 19557
        },
        "positive": {
          "value": 3094
        },
        "unassigned": {
          "value": 1010
        },
        "negative": {
          "value": 343
        }
      }
    },
    "2016-04": {
      "value": 16472,
      "data": {
        "neutral": {
          "value": 12893
        },
        "positive": {
          "value": 2320
        },
        "unassigned": {
          "value": 897
        },
        "negative": {
          "value": 362
        }
      }
    }
  }
}

Available analytics types

Terms (count)

Counts the number of mentions for each distinct value of the requested field.

Specific parameters

  • size (integer, optional): Maximum number of distinct value to return. Default to 0, which means no maximum, all values.

  • min_doc_count (integer, optional): Minimum number of mentions for a value to appear in the result set. Default to 0.

  • include (string or array[string], optional): Only include in result set results that match filter. Can be either a regular expression or an array of exact values.

  • exclude (string or array[string], optional): Exclude values from result set. Filter can be either a regular expression or an array of exact values.

Date histogram

Counts the number of mentions for each interval based on requested date field. Only applies to dates fields.

Specific parameters

  • interval (string, required): how to compute intervals on which to separate mentions’ counts.

  • offset (string, optional): shift intervals borders by requested period. Same format as interval.

  • time_zone (string or float, optional): time zone to use when computing intervals borders. Can be:

    • A positive or negative float.

    • A string on the format “HH:MM” (with an eventual sign). Example: -02:30.

    • A timezone name. This is the only way to ensure time saving changes are correctly taken into account. Example: Europe/Paris.

  • min_doc_count (integer, optional): Minimum number of mentions for a value to appear in the result set. Default to 0. Remark: 0 value is the only way to ensure having continuous intervals in the result set.

Available intervals

  • day (same as 1d or 24h)

  • month (same as 1M)

  • week (same as 1w)

  • year (same as 1y)

  • a custom duration in milliseconds

  • a custom duration, like 2d for 2 days, using one of these units:

    • y Year

    • M Month

    • w Week

    • d Day

    • h Hour

    • m Minute

    • s Second

Sum, average and statistics

These different analytics types do not accept any other parameter. They produce a single result and therefore must be used as the last item of the aggs array.

  • sum: Sum of values.

  • avg: Average of values.

  • min: Minimum value.

  • max: Maximum value.

  • stats: Gives various values: minimum, maximum, average, sum and mention’s count.

Fields available for analytics

Field

Default type

author.demographics.affinity_list

terms

author.demographics.age

terms

author.demographics.bio_tag_list

terms

author.demographics.family_status

terms

author.demographics.gender

terms

author.demographics.job_list

terms

author.demographics.language_list

terms

author.demographics.marital_status

terms

author.location.city

terms

author.location.state

terms

author

terms

channel

terms

collaboration_status

terms

collaboration_user

terms

communities

terms

country

terms

crawled_at

date_histogram

date

date_histogram

hashtags

terms

human_review_status

terms

influence

sum

infused_at

date_histogram

language

terms

profile

terms

sentiment

terms

site_type

terms

site

terms

source

terms

synthesio_rank

sum

tags

terms

title

terms

type

terms

urls

terms

Tags fields

As each mention can have various tags, when filtering on tags and requesting a term analytics by tags, an include option is automatically set, to avoid seeing tags not matching the filters in the result set.

Explicitly defining either exclude or include parameter in the payload will prevent this automatic setting to take place.

Hashtags

Hashtags are available for Twitter and Instagram mentions.

Reach

Computes reach values for the given filters with eventual analytics elements.

The aggs array follow the same rules as the analytics route. Reach computation is implicitly added as the last analytic field. Single value analytics types cannot be used on this route.

Example Mention API reach request

POST https://rest.synthesio.com/mention/v2/reports/38298/_reach HTTP/1.1
Host: rest.synthesio.com
Authorization: Bearer YOUR_API_TOKEN
Content-Type: application/json

{
    "aggs": [
        {
            "field": "date",
            "interval": "month"
        },
        {
            "field": "sentiment"
        }
    ],
    "filters": {
        "period": {
            "begin": "2016-03-01T00:00:00Z",
            "end": "2016-05-01T00:00:00Z"
        },
        "countries": [
            "FRA"
        ]
    }
}

Response

{
  "data": {
    "2016-03": {
      "data": {
        "neutral": {
          "twitter": 151479204,
          "facebook": 56948,
          "blog": 626331,
          "mainstream": 14497
        },
        "positive": {
          "twitter": 6957833,
          "instagram": 80,
          "facebook": 17509,
          "blog": 678689,
          "forum": 2634,
          "mainstream": 28163
        },
        "unassigned": {
          "twitter": 8786733,
          "mainstream": 128
        },
        "negative": {
          "twitter": 30318,
          "blog": 1849,
          "mainstream": 2174
        }
      }
    },
    "2016-04": {
      "data": {
        "neutral": {
          "twitter": 549294,
          "facebook": 1048,
          "blog": 1049,
          "forum": 1487
        },
        "positive": {
          "twitter": 92164,
          "facebook": 6846,
          "blog": 2327,
          "mainstream": 1115
        },
        "unassigned": {
          "twitter": 50226
        },
        "negative": {
          "twitter": 23317,
          "mainstream": 48744
        }
      }
    }
  }
}