Emails API
This contains utilities, blockers, and comparers relevant to email addresses
            mismo.lib.email.clean_email
clean_email(
    email: StringValue, *, normalize: bool = False
) -> StringValue
Clean an email address.
- convert to lowercase
 - extract anything that matches r".(\S+@\S+)."
 
If normalize is True, an additional step of removing "." and "_" is performed.
This makes it possible to compare two addresses and be more immune to noise.
For example, in many email systems such as gmail, "." are ignored.
            mismo.lib.email.ParsedEmail
    A simple data class holding an email address that has been split into parts.
            mismo.lib.email.ParsedEmail.domain
  
      instance-attribute
  
domain: StringValue = nullif('')
The domain part of the email address, eg 'gmail.com'.
            mismo.lib.email.ParsedEmail.full
  
      instance-attribute
  
full: StringValue = full
The full email address, eg 'bob.smith@gmail.com'.
            mismo.lib.email.ParsedEmail.user
  
      instance-attribute
  
user: StringValue = nullif('')
The user part of the email address, eg 'bob.smith' of 'bob.smith@gmail.com'
            mismo.lib.email.ParsedEmail.__init__
__init__(full: StringValue)
Parse an email address from the full string.
Does no cleaning or normalization. If you want that, use clean_email first.
| PARAMETER | DESCRIPTION | 
|---|---|
                full
             | 
            
               The full email address. 
                  
                    TYPE:
                        | 
          
            mismo.lib.email.ParsedEmail.as_struct
as_struct() -> StructValue
Convert to an ibis struct.
| RETURNS | DESCRIPTION | 
|---|---|
                
                    An ibis struct<full:string, user:string, domain: domain>
                
             | 
            
               | 
          
            mismo.lib.email.match_level
match_level(
    e1: StructValue | StringValue,
    e2: StructValue | StringValue,
    *,
    native_representation: Literal[
        "integer", "string"
    ] = "integer",
) -> EmailMatchLevel
Match level of two email addresses.
| PARAMETER | DESCRIPTION | 
|---|---|
                e1
             | 
            
               The first email address. If a string, it will be parsed and normalized. 
                  
                    TYPE:
                        | 
          
                e2
             | 
            
               The second email address. If a string, it will be parsed and normalized. 
                  
                    TYPE:
                        | 
          
| RETURNS | DESCRIPTION | 
|---|---|
                level
             | 
            
               The match level. 
                  
                    TYPE:
                        | 
          
            mismo.lib.email.EmailMatchLevel
    
              Bases: MatchLevel
How closely two email addresses of the form <user>@<domain> match.
Case is ignored, and dots and underscores are removed.
            mismo.lib.email.EmailMatchLevel.ELSE
  
      class-attribute
      instance-attribute
  
ELSE = 4
None of the above.
            mismo.lib.email.EmailMatchLevel.FULL_EXACT
  
      class-attribute
      instance-attribute
  
FULL_EXACT = 0
The full email addresses are exactly the same.
            mismo.lib.email.EmailMatchLevel.FULL_NEAR
  
      class-attribute
      instance-attribute
  
FULL_NEAR = 1
The full email addresses have a small edit distance.
            mismo.lib.email.EmailMatchLevel.USER_EXACT
  
      class-attribute
      instance-attribute
  
USER_EXACT = 2
The user part of the email addresses are exactly the same.
            mismo.lib.email.EmailMatchLevel.USER_NEAR
  
      class-attribute
      instance-attribute
  
USER_NEAR = 3
The user part of the email addresses have a small edit distance.
            mismo.lib.email.EmailsDimension
    A dimension of email addresses.
This is useful if each record contains a collection of email addresses. Two records are probably the same if they have a lot of email addresses in common.
            mismo.lib.email.EmailsDimension.__init__
__init__(
    column: str,
    *,
    column_parsed: str = "{column}_parsed",
    column_compared: str = "{column}_compared",
)
Initialize the dimension.
| PARAMETER | DESCRIPTION | 
|---|---|
                column
             | 
            
               The name of the column that holds a array 
                  
                    TYPE:
                        | 
          
                column_parsed
             | 
            
               The name of the column that will be filled with the parsed email addresses. 
                  
                    TYPE:
                        | 
          
                column_compared
             | 
            
               The name of the column that will be filled with the comparison results. 
                  
                    TYPE:
                        | 
          
            mismo.lib.email.EmailsDimension.compare
compare(t: Table) -> Table
Add a column with the best match between all pairs of email addresses.
            mismo.lib.email.EmailsDimension.prepare_for_fast_linking
prepare_for_fast_linking(t: Table) -> Table
Add a column with the parsed and normalized email addresses.